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This paper presents a tutorial review of recent advances in the field of time-frequency (t, /) signal 
processing with focus on exploiting (t, /) image feature information using pattern recognition techniques 
for detection and classification applications. This is achieved by (1) revisiting and streamlining the 
design of high-resolution quadratic time frequency distributions (TFDs) so as to produce adequate (t, /) 
images, (2) using image enhancement techniques to improve the resolution of TFDs, and (3) defining new 
(t, /) features such as (t, /) flatness and (t, /) entropy by extending time-domain or frequency-domain 
features. Comparative results indicate that the new (t, /) features give better performance as compared to 
time-only or frequency-only features for the detection of abnormalities in newborn EEG signals. Defining 
high-resolution TFDs for the extraction of new (t, /) features further improves performance. The findings 
are corroborated by new experimental results, theoretical derivations and conceptual insights. 

© 2015 Published by Elsevier Inc. 



1. Introduction 

1.1. Non-stationary signals 

There has been a significant effort in the past decades to de- 
velop Digital Signal Processing (DSP) methods that are more ap- 
propriate for the representation, analysis and processing of non- 
stationary signals [1]. The studies initially focused on the design 
of Spectrograms, filter-banks and quadratic methods related to 
the Wigner-Ville Distribution (WVD) for optimal representation of 
signals such as mono-component Linearly Frequency Modulated 
(LFM) signals used in radar [2], sonar [3], geophysics [4], and other 
applications [5-8]. Researchers then shifted their focus to the anal- 
ysis of multi-component signals, resulting in a number of new ap- 
proaches to design advanced Time-Frequency Distributions (TFDs) 
and related techniques [9-14,8]. Such multi-component signals 
provide a more precise modeling of most natural signals, includ- 
ing biomedical signals such as fetal movement signal, phonocar- 
diogram (PCG), electrocardiogram (ECG), speech, electroencephalo- 
gram (EEG), heart rate variability (HRV) and many others. A typical 
model for such signals was originally proposed in ([15], p. 419). 
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Following this approach, a multi-component signal of time dura- 
tion T and bandwidth B can be expressed as: 

N c N c 

s(t) = £s k (t) = J^a k (t)cos(</>k(t)), (1) 

k = 1 k= 1 

where N c is the total number of components, s/<(t ) is the kth sig- 
nal component, aj c (t) and are the instantaneous amplitude 
(IA) and instantaneous phase (IP) of the /cth signal component. The 
IA a/<(t) is a low-frequency amplitude whose spectrum is assumed 
not to overlap with the spectrum of the higher-frequency signal 
cos (0fc(t)) [16,17]. The derivative of the IP //<(t) = defines 

the instantaneous frequency (IF) of the /cth signal component. The 
above defines the AM-FM model of the signal where the IA is the 
AM law and the IF is the FM law of the signal. The time support 
of aj c (t) is T and the frequency support of //<(t) is B. 

Using the above model, a multi-component signal is completely 
characterized by the number of components N Cf the IA and IF of 
each component; residual noise could be added in certain situa- 
tions when there are interferences. The time-frequency ( t , /) ap- 
proach is a preferred method to represent such multi-component 
signals as it allows solving the problem of estimating the charac- 
teristics of the components constituting the signal in Eq. (1). For 
example, it can provide a measure of the energy leakage that takes 
place around the components due to slow variation of their IAs, 
and a measure of the relative energy distribution throughout the 
(t, /) plane. Such information can then be used to refine a number 
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of advanced techniques and methodologies in applications such as 
detection and classification using selected ( t , /) features for pat- 
tern recognition. An important generic application of such methods 
is diagnosis, which is related to condition monitoring and fault de- 
tection. This applies to both machines and humans. For the sake 
of illustration, we will use physiological signals in this paper, al- 
though the method can be applied to all fields where decisions 
need to be made on the basis of information collected from non- 
stationary signals. 

1.2. Pattern recognition for diagnosis , condition monitoring and fault 
detection 

The basic idea is to find and recognize patterns in the sig- 
nal that would indicate the presence of a fault, an abnormality 
or otherwise would indicate a normal condition for the process 
generating the signals. If there is an abnormality, then one needs 
to identify and classify the abnormality. Although the approach 
is general, this paper illustrates it by focusing on some particu- 
lar physiological signals such as electroencephalography (EEG) for 
monitoring new-born brain activity and predicting newborn health 
outcomes through diagnosis and prognosis. 

1.2.1. Acquisition and representation of real signals 

Although the methodologies presented are general, this paper 
considers only physiological signals for illustration, including EEG, 
electrocardiography (ECG), heart rate variability (HRV) and heart 
sounds which are routinely used by medical specialists for diagno- 
sis in a range of situations. After acquisition, these signals are usu- 
ally processed in several stages, including a pre-processing stage 
like amplification and filtering to suppress artifacts and noise. In 
some situations, multi-channel physiological signals are acquired 
using multiple spatially distributed sensors that provide both tem- 
poral and spatial information. The diversity given by space time in- 
formation can allow a more refined analysis and better interpreta- 
tion of the data in applications such as source localization, signal- 
to-noise ratio (SNR) enhancement, classification, and removal of 
artifacts [18,19]. As indicated earlier, ( t , f) analysis is widely used 
for the representation and analysis of physiological signals because 
of the non-stationary and multi-component characteristics of these 
signals; the (t, /) image allows a refined treatment of further pro- 
cessing stages such as detection and classification, leading to diag- 
nosis, condition monitoring and fault detection. 

1.2.2. Automated TF detection of abnormalities in non-stationary 
signals 

Early detection of abnormalities in physiological signals can 
help save or improve patients’ lives. Manual detection of abnor- 
mality requires a constant monitoring of the relevant physiolog- 
ical signal by a qualified medical expert. The development of an 
automatic abnormality detection technique, which can be imple- 
mented on a digital computer, would result in a major advance 
in medical practice. Such detection of abnormality in physiolog- 
ical signals can be considered as a pattern classification prob- 
lem involving extraction of features from the signals and training 
of a classifier. Features can be extracted from the time-domain 
representation [20-22], the frequency-domain representation [23], 
a combination of both time-domain and frequency-domain rep- 
resentations 24], or joint time frequency representations using 
TFDs [7,25,5,26-30,13,31]. The performance of abnormality detec- 
tion techniques depends of course on the choice of signal rep- 
resentation and selection of features. The abnormality detection 
techniques that are based on TFDs are shown to outperform time- 
domain or frequency-domain only approaches as TFDs are more 
adapted to analyze the non-stationary characteristics of physiolog- 
ical signals [26] and convey critical information about the signal. 




Fig. 1 . The proposed (t, f) pattern recognition approach. 



TFDs are normally compared in terms of their resolution and 
cross-term suppression properties. Previous studies have shown 
that high-resolution TFDs, like the Modified B-Distribution (MBD), 
give good classification results [25,26]. A key question is therefore 
whether the performance of TFD-based signal classification tech- 
niques can be related to the resolution of the TFD. This paper aims 
to answer this question by investigating the influence of resolution 
parameters on classification performance. 

TFDs are rich in information partly because they increase the 
dimensionality of the classification problem at the expense of in- 
creased computational costs. In order to avoid this problem, it is 
important to extract only the most relevant features from the TFD. 
Such features can be extracted in a number of ways. These include 
dividing a TFD into a number of tiles and computing the energy in 
each tile [6], reducing the dimension of the TFD by linear trans- 
formation methods [5,30,31], selecting relevant points from a TFD 
based on relevance or mutual information measure [29], translat- 
ing time-domain or frequency-domain features to (t, f) domain 
and extracting relevant features [26], and using image processing 
techniques to extract features like statistical measures, texture re- 
lated features, geometric features, ridges [25,28], to name just a 
few. 

1.3. Scope and key topic covered 

This study reviews previous work on the design of high- 
resolution TFDs [32,10], (t, /) image processing and feature extrac- 
tion methodology [25,26] with three objectives: (1) streamlining 
the design of high-resolution TFDs, (2) improve diagnostic applica- 
tions and (3) providing the reader with an illustration on physio- 
logical signal classification techniques. The study refines, updates 
and extends previous studies with improved precision and illus- 
trates the improvements using a real application: enhancing new- 
born health outcomes. The style of this paper is a tutorial review 
with scope and coverage defined by the inclusion of the following: 

1. A streamlined review of quadratic (t, f) distributions (QTFDs) 
formulations; 

2. A methodology for the design of new high-resolution QTFDs 
and the study of how they improve the performance of fea- 
tures extraction; 

3. A review of multi-component IF estimation techniques as a 
performance measure to compare TFDs; 

4. A review of (t, /) image processing techniques for cross-term 
suppression, resolution enhancement and de-noising as a pre- 
processing stage before classification; including the design of a 
(t, /) image enhancement technique based on directional fil- 
tering to improve the resolution of QTFDs; 

5 . A formulation of (t, f) translated features from time-domain 
only and frequency-domain only features; 

6. Illustrative experiments to apply the above points to EEG 
seizure detection using a large medical database. 

This study develops a (t, /) pattern recognition methodology to 
link design of high-resolution TFDs, (t, f) image enhancement 
techniques, and feature extraction stages as illustrated in Fig. 1. 
Note that the authors have restricted to the review of only QTFDs 
because of their high resolution, simple interpretation as a dis- 
tribution of signal energy in a (t, /) plane and widespread use. 
The rest of the paper is organized as follows. The design of high- 
resolution QTFDs is discussed in Section 2. Multi-component IF 
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estimation techniques are described in Section 3 and compared in 
terms of performance. Then, Section 4 presents a review of image 
processing techniques for enhancement of TFDs. The methodology 
for extracting (t, /) features is described in Section 5. Experi- 
mental results are described and discussed in Section 6. Finally, 
Section 7 concludes the paper. Appendix A describes the resolution 
measure used in this study to compare the performance of TFDs. 
The issues related to computational cost and real time implemen- 
tation are discussed in Appendix B. In addition, the programs that 
are used in this study are described in Appendix C. Appendix D 
gives a list of (t, /) features obtained by extending time-only fea- 
tures to the (t, /) domain. Appendix E refers the reader to the real 
data used in this study which appear as Supplementary material. 

2. Quadratic time-frequency signal processing 

2.1. Time frequency formulations and representations 



The selected compromise for this trade-off selection depends on 
the signal specifications. High-resolution TFDs are needed for accu- 
rate estimation of signal parameters such as the IF, total number of 
components which are important features for a number of applica- 
tions including signal modeling, analysis and classification. The es- 
timation of QTFDs based on the general (t, f) formulation requires 
one FT and a two dimensional (2-D) convolution along time and 
frequency axis. Along with this general (t, /) formulation, a QTFD 
can be expressed starting from a time-lag formulation, Doppler- 
lag formulation or a Doppler-frequency formulation, as outlined in 
the next sections. 

2.12. Time-lag formulation of QTFDs 

Eq. (6) can be estimated via time-lag formulation by replacing 
the convolution operation along the frequency axis with the multi- 
plication operation along the lag axis ([33 , p. 67), resulting in the 
following expression: 



Let us consider a non-stationary real signal s(t) as defined in 
Eq. (1). The WVD is the core member of the class of QTFDs. The 
WVD is defined by taking the Fourier transform (FT) of an instan- 
taneous auto-correlation function I< z (t, r) expressed as 1]: 

W z (t,f) = Jl< z (t,T)e- 2jnfT dT, (2) 

R 

where I< z (t, r) is defined as 

(3) 

and where z(t) is the analytic associate of a real signal s(t) ob- 
tained by the use of the Hilbert transform; expressed as 



z(t)=s(t) + j"H{s(t)}. (4) 

In the above expression, the imaginary part represents the Hilbert 
Transform defined by H{s{t)} = ^p.v.{/ R f^dr}, where p.v. repre- 
sents a principle value expressed as 



1 f s 



s ' r -dr [ = lim 

T | S^O 



t—S +00 

dr + f^ 
J t-T J t- T 

oo t +5 



dr 



(5) 



For a real AM-FM signal s(t) = a(t) cos(</>(t)), the analytic signal 
z(t ) can be written as: z(t ) = a(t)e^^\ under some conditions 
such as when a(t) is a low-frequency amplitude whose spec- 
trum does not overlap with the spectrum of the high-frequency 
signal [16,17]. For such signals with large Bandwidth-Time- 
duration (BT) product, then the IF provides an estimate of the FM 
law. 



2.1.1. Direct quadratic time-frequency formulation 

The WVD defined in Eq. (2) gives ideal concentration for mono- 
component LFM signals, but it produces undesired cross-terms for 
non-linear frequency modulated (FM) or multi-component signals. 
Cross-terms can be reduced by convolving the WVD with a 2D 
(t, /) kernel, resulting in the following expression [1], 



P(f,f)= f G(t, r) * I( z (t, r) e 2j7TfT dr , (7) 

R " ' 

RzQ,t ) 

where G(t, r) is called the time-lag kernel of the TFD and is re- 
lated to y(t, f) by inverse FT. The time-lag formulation is widely 
used for implementing TFDs because of its conceptual simplicity 
and computational efficiency as it requires only one convolution 
along the time axis and one FT from lag domain to frequency do- 
main [13]. 

2.1.3. Doppler-frequency formulation of QTFDs 

A Doppler-frequency based formulation of TFDs can be ob- 
tained from the (t, /) formulation by replacing the time convolu- 
tion in Eq. (6) by a multiplication in Doppler after taking a FT [33]: 

Pit, f) = f Giv, f)*k z (v, f)e 2j7ZVt dv, (8) 

R 

where Q(v, f) is the kernel in the Doppler-frequency domain. The 
quantity k z (v, /) is often referred to as the spectral autocorrelation 
function (SAF) defined as ([33 , p. 70): 

k z (v,/) = z(7+0Z*(/-0, (9) 

where Z(/) is the FT of z(t). The Doppler-frequency TFD formu- 
lation requires one convolution along the frequency axis and one 
inverse FT (from Doppler to time) to transform the SAF to the (t, /) 
domain representation. The computational cost of transforming the 
SAF to the ( t , /) domain is equal to that of transforming the in- 
stantaneous auto-correlation function to the (t, /) domain, apart 
from the required estimation of Z(/). 

2.1.4. Doppler-lag formulation of QTFDs 

A Doppler-lag formulation can be obtained by replacing the 
convolution operation along the time axis in Eq. (7) with the 
equivalent multiplication along the Doppler-axis ([33], p. 69), re- 
sulting in the following expression: 



p(tJ) = Y(t,f)**W z (tJ) (6) 

where y(t, f) is the 2D (t, /) smoothing kernel. Eq. (6) represents 
the (t, /) formulation of QTFDs. The 2D smoothing of the WVD 
with y(t,f) reduces the cross-terms, but blurs the auto-terms. 
For this reason, the (t, /) kernels are designed to achieve the best 
tradeoff between the following two conflicting objectives: 



p(t,f) = J J g(V , T)A z (V, T) e 2 j^/r +2 j7Ttv d^ d v , 



A z (v,r) 



( 10 ) 



where A z (v, r) represents the ambiguity function, defined as: 



A z (v,t) = 



/ 



K z (t, r)e~ 2j7rtv d t, 



(H) 



a) Minimize cross-terms; and g(v, t) = / R G(t, r)e 2jjrtl, dt is the Doppler-lag kernel. This 

b) Retain the resolution of auto-terms. formulation is often used for designing high-resolution TFDs 
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as it allows entering filter specifications in the formulation of 
g(v, r) [33]. The Doppler-lag formulation requires three FTs as 
computing the ambiguity function from the time-lag kernel needs 
one FT and transforming the ambiguity function to the TFD re- 
quires two FTs. 

These four formulations are illustrated in Fig. 3. 

The selection of one of the above four formulations depends 
on the specifications of the application and signal characteristics 
such as duration T, bandwidth B, and value of BT product. For 
example, for narrow-band signals (i.e. with small B ), Eq. (8) would 
be preferable to use ([33], p. 70). 



2.1.5. Which TFD to use? Spectrogram as a non-separable QTFD , WVD, 
S-method or another TFD? 

A question asked by most beginners is which TFD to use to get 
started, given that the WVD has a good resolution with a prob- 
lem of cross-terms, the Spectrogram has a problem of resolution, 
but no apparent cross-terms and other QTFDs appear more compli- 
cated. To answer this question, let us first review the Spectrogram 
in detail. 

The Spectrogram is a simple yet effective TFD in some situa- 
tions. It is defined as: 






|/ s(r)w(t-r)e 2j7r ^ T dr 



( 12 ) 



where w(t) is the analysis window. The Spectrogram is a QTFD 
as its time-lag kernel can be expressed as the instantaneous auto- 
correlation function of the window ([33], pp. 47-48). 

C(t,r) = w*^t+^jw|t-0 (13) 

This implies that the (t, /) kernel of the Spectrogram y(t, f ) is the 
WVD of the window w(t), and equivalently the ambiguity domain 
kernel filter g{v , r) is the ambiguity function of the window w(t) 
([33], p. 76), 

K(t,/) = J w*ft + 0 w ( f_ ^)e~ 2]7TfT dr. (14) 

R 



9i(*KW) 




9i(f)S2(r) G,MG 2 (/) 



G 1 M. 92 M 

Fig. 2. The relationships between equivalent formulations of separable (t, /) kernels 
in several dual domains. 

SM s (t, /) = 2 j G(9)F s (t, f + 9)F*(t, f - 9)d9, (16) 

R 

where F s (t, f) is the short-time Fourier transform of s(t), defined 
as: 





F s (t , 




w(r)s(t + r)e j27r ^ T dr. 



(17) 



In Eq. (16), G(0) is a narrow window whose length controls the 
cross-term suppression and auto-term resolution properties of the 
TFD. Previous studies have shown that an appropriate selection of 
the length of G(6 ) can combine advantages of both Spectrogram 
and WVD. The resultant TFD can have auto-term resolution close 
to the WVD with a significant suppression of cross-terms [35]. The 
ambiguity domain kernel of the S-method is given below 13]. 



g(y, r) = G(v)* 



/"HM-jk' 



j2nuv du 



= G(v)* A w (y, t). 



(18) 



Eq. (16) shows that the S-method is computed using the short- 
time Fourier transform, which makes it more computationally ef- 
ficient than other QTFDs such as the WVD. Let us now review the 
design of separable kernel TFDs. 



To discuss the limitations of the smoothing kernel of the Spec- 
trogram, let’s consider a Gaussian window w(t) = (a/ 7 r) 1 / 4 e _ 2 t . 
Using Eq. (14) results in the (t, /) kernel for this window ex- 
pressed as 34]: 

ni 2 47T 2 / 2 

Y(t, /) = 2e at e « . (15) 

The above expression shows that the smoothing kernel of the 
Spectrogram is non-separable given that both t and / variables 
are parametrized by the same scale value. The smoothing along 
the time axis cannot be controlled independently from the one 
in the frequency axis and vice-versa; this makes the Spectrogram 
extremely sensitive to the length of the window ([33], p. 221). At- 
tempts to mitigate this limitation of the Spectrogram include using 
separable kernel TFDs that have the flexibility to independently ad- 
just the type of smoothing along the time or frequency axis as 
discussed in Section 2.2. 

In short, the Spectrogram is an easy to use QTFD that does not 
have a problem with cross-terms, but suffers from poor resolution 
as well as sensitivity to the choice of window length. On the other 
hand, the WVD has higher resolution for mono-component LFM 
signals, but has an issue of cross-terms that may be problematic 
for analysis and modeling. 

The S-method, which can be considered an optimized version 
of the Spectrogram, combines the advantages of the Spectrogram 
and WVD. It is defined as [35]: 



2 . 2 . High-resolution TFDs design based on separable kernels 

2.2.1. Definition of separable kernel TFDs 

An attempt at simplification is to first consider (t, /) kernels 
that can be represented simply as the product of an indepen- 
dent time-only kernel with an independent frequency-only kernel; 
these separable (t, /) kernels ([ 33 ], pp. 213 - 222 ) are expressed as: 

y(t,f)=gi(t)G 2 (f). ( 19 ) 

The meaning and significance of this simplification is that we can 
define and design some QTFDs simply by smoothing the WVD in 
t and then in /. The shape and size of gi(t) or G 2 (f) deter- 
mines the smoothing along the time or frequency axis respectively. 
In many applications, separable (t, /) kernels have shown to give 
good enough results in terms of cross-term suppression and auto- 
term resolution, making them popular as they are easy to use. 
Other equivalent formulations of ( t , f) separable kernels are given 
below in other related domains. 

• Time-lag domain: G(t, r) = gi(t)g 2 (T) 

• Doppler-frequency domain: Q(v, f ) = G^(v)G 2 (f) 

• Doppler-lag domain: g{v , r) = G\(v)g 2 (z) 

Fig. 2 illustrates the relationships between the above formulations 
of the (t, /) kernels. 
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— G(t,r) * K z (t,r ) = &(v,f) * kfy,f) 



A(v, r) = g(v , r) A(v, r) 

Fig. 3. The relationships between equivalent formulations of TFD in several dual 
domains. 

As discussed above, high-resolution separable kernel TFDs can 
be designed in the four possible domains, that is: (t, /) domain, 
Doppler-frequency domain, time-lag domain or Doppler-lag do- 
main; they are related to each other by FT as illustrated in Fig. 3. 
The ambiguity domain (lag-Doppler domain) is usually preferred 
for designing TFDs as filtering in the ambiguity domain is per- 
formed by a multiplication of lag and Doppler windows instead 
of one or two convolutions in other domains. 

2.2.2. Cross-terms in the WVD and ambiguity domain 

The WVD defined in Eq. (2) is often considered as the core dis- 
tribution of QTFDs; it suffers from the presence of undesired arti- 
facts or cross-terms that prevent its straightforward interpretation 
as an energy distribution versus t and /. These cross-terms can 
be classified into two types: outer- terms and inner- terms. Outer- 
terms can be easily explained by considering a two component 
signal: z(t) = z^(t) + Z2(0 for which the WVD is given by 

W z (t, f ) = W Z| (t, f ) + W Z2 (t, f ) 

+ 2Re j J Z\ + Cjz* _ ^je~ 2]7Z fz dr J . (20) 

R 

W Zl z 2 (t,f) 

In the above expression, W Zl (t, /) and W Z2 (t,f ) represent auto- 
terms that describe an energy concentration while W ZlZ2 (t, /) rep- 
resents outer-term artifacts that are generated by the interaction of 
signal components zi(t) and Z 2 (t). The second type of artifacts is 
called inner-terms; they are generated by the non-linear FM char- 
acteristic of mono-component signals. 

A non-linear mono-component signal can be approximated by 
the summation of a number of piece-wise LFM signal components. 
For example, a single non-linear FM signal component z\ (t) can be 
written as: 

Zl(t)=Zl,a(0 + Zl,b(t), (21) 

where zi >a (t) and zi^t) represent respectively the first and the 
second pieces constituting the signal zi(t). Its WVD is given by: 



inner-terms is partly orthogonal to the IF of a non-linear FM signal 
([33], p. 63). 

The ambiguity function is often used for designing QTFDs be- 
cause it is the 2D-FT of the WVD, which then allows filtering in the 
(t, /) domain by a multiplication in the ambiguity domain. Specif- 
ically, Eq. (6) and Eq. (10) show that TFDs can be expressed as the 
result of a double convolution of (t, f ) kernels with the WVD in 
the (t, /) domain or as the 2D-FT of the product of the Doppler- 
lag kernel with the ambiguity function. This leads to the following 
simple procedure for designing the QTFDs. 

In the ambiguity domain, the cross-terms seem to appear away 
from the origin (see Fig. 4(b)) and are separated by a distance that 
is approximately equal to the frequency spacing in between two 
signal components as illustrated by Fig. 4(b). On the other hand, 
the auto-terms pass through the origin and appear between two 
cross-terms in the ambiguity domain. We note that the energies 
of the auto-terms are mostly concentrated around the origin as 
illustrated in Fig. 4(b). 

QTFDs can then be defined to exploit the geometrical proper- 
ties of auto-terms and cross-terms in the ambiguity domain by 
designing kernels using the “almost low-pass” characteristics of 
auto-terms and the high-pass characteristics of cross-terms in the 
ambiguity domain, resulting in kernels that discriminate and sepa- 
rate them. The cross-terms can therefore be reduced by applying a 
suitable 2D filter in the ambiguity domain and then transforming 
the ambiguity function back to the (t, /) domain. 

The exact specifications of a 2D filter depend on the nature of 
the signal being analyzed. For example, for Fig. 4(b), a 2D filter that 
is elongated along the lag axis with low-pass characteristics along 
the Doppler axis will lead to a higher-resolution TFD. The result is 
shown in Fig. 4(c); and for illustration of this principle, the next 
subsection presents simple examples of relevant separable kernels 
that lead to high-resolution QTFDs. 

2.2.3. Principles of separable kernel TFDs design 

To illustrate this principle in clear terms, let us consider two 
special cases: lag-independent TFDs and Doppler-independent 
TFDs. These two cases of separable kernel TFDs can be used in 
certain specific situations related to signal characteristics, resulting 
in easy designs and implementations. 

a) Lag-independent (LI) TFDs: Such lag-independent TFDs only 
perform smoothing along the time axis and are therefore char- 
acterized by G 2 (/) = 8(f) or g 2 (t) = 1. This kind of kernel 
is suitable for signals whose IF is parallel to the time axis in 
the ( t , /) domain. In the ambiguity domain, such signals have 
most of their auto-term energy concentrated along the lag axis 
as these signals appear as an impulse along the frequency axis 
in the (t, /) domain, and as a constant along the lag axis in 
the ambiguity domain. The MBD kernel represents such an ex- 
ample of LI-TFD. It is defined in the ambiguity domain by the 
following kernel [1]: 





W Zl (t,/) = W Zl a (t,/) + W Zu (t, /) 

+ 2Re j J Z\,a(t+ 0z* b (t- Cje~ 2j7TfT dx . 

R 



( 22 ) 

Cross-terms are oscillatory in nature, with the same order of mag- 
nitude as that of auto-terms [36 , and they lie half-way in-between 
two signal components in (t, /) plane as illustrated in Fig. 4(a). 
The direction of oscillation of outer-terms is orthogonal to the line 
joining the two components while the direction of oscillation of 



g(v,r) = Gi(v) = 



ITQ8 + j7rv)| 2 
r 2 OS) 



|V| < T, 0</S<l 

(23) 



where v and p are bounded to ensure that the MBD is a low- 
pass filter. Some physiological signals like HRV and EEG seizure 
have negligible FM; the direction of oscillation of outer- terms 
for such signals is parallel to the time axis. The MBD, a lag- 
independent distribution, is naturally suited for the analysis of 
such signals as it only performs smoothing along the time axis, 
thus avoiding the blurring of auto-terms in all directions usu- 
ally caused by a 2D smoothing. To illustrate, let us consider a 
multi-component signal composed of two tones and one LFM, 
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Fig. 4. Illustration of inner-term and outer-term artifacts using the sum of a quadratic FM signal and a linear FM signal, (a) (t, /) domain representation using the WVD. 
(b) Ambiguity domain representation, (c) ( t , f ) domain representation using the extended modified B distribution (EMBD). 



and analyze it using the MBD, windowed WVD (discussed in 
the next subsection), and Hamming-Hanning kernel TFD [32] 
as shown in Fig. 5. The MBD, in Fig. 5(a), gives high energy 
concentration for the two tones but fails to give the same level 
of energy concentration for the LFM component as it performs 
smoothing only along the time axis. The Hamming-Hanning 
kernel TFD, shown in Fig. 5(b), gives a blurred representa- 
tion for all the signal components due to its 2D smoothing 
kernel. The windowed WVD in Fig. 5(c) fails to suppress the 
cross-terms due to its frequency-only smoothing that is al- 
most orthogonal to the direction of oscillation of cross-terms. 
Of course, for all the TFDs, further improvements can be made 
by optimizing window size and window shape following tra- 
ditional DSP filter design criteria, 
b) Doppler-independent (DI) TFDs: Such Doppler-independent 
TFDs only perform smoothing along the frequency axis and 
are characterized by gi(t) =S(t) or Gi(v) = 1. These DI types 
of kernels are appropriate for signals whose auto-terms are 
parallel to the frequency axis in the (t, /) domain. In the 
ambiguity domain, such signals have most of their auto-term 
energy concentrated along the Doppler axis. A simple example 
of a DI-TFD is the WVD windowed with a Hamming window 
along the lag axis expressed as: 

g(v, r) = g 2 (x ) = 0.54 - 0.46cos(27TT), -0.5 < r < 0.5, 

( 24 ) 



where r is bounded to ensure that Hamming is a low-pass 
filter. 

The DI-TFD is suited to real-world signals that have the char- 
acteristics described above, like for instance EEG spike signals 
which can be modeled by a train of impulses in the time- 
domain. The direction of oscillation of outer-terms for such 
signals is parallel to the frequency axis. Methods, such as the 
windowed WVD defined by Eq. (24), are more suitable for 
the analysis of this type of signals as they perform smoothing 
along the frequency axis only and avoid unnecessary blurring 
caused by a complete 2D filter. Fig. 6 shows the MBD, win- 
dowed WVD, and Hamming-Hanning kernel TFD of a signal 
composed of two impulses in the time domain and an LFM 
signal. The MBD, shown in Fig. 6(a), gives poor energy con- 
centration for all the signal components as the direction of its 
smoothing kernel is orthogonal to the direction of energy con- 
centration of the signal components. The Hamming-Hanning 
kernel TFD, shown in Fig. 6(b), performs 2D smoothing and 
therefore blurs the signal auto-terms, but its resolution is bet- 
ter than the MBD. The windowed WVD, shown in Fig. 6(c), 
results in a high-resolution (t, /) representation for the two 
impulses but it fails to give same resolution for the LFM com- 
ponent as it only performs smoothing along the frequency 
axis. 

Apart from the LI and DI TFDs, most of the other separable 
kernel TFDs, like the B-distribution, the extended Modified B- 
distribution and the Compact support kernel employ smooth- 
ing filters along both time and frequency axes [33,32,10]. Some 
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Fig. 5. (t, /) representations of a signal composed of two tones and one LFM. (a) The MBD (ft = 0.05). (b) The Hamming-Hanning kernel TFD (window length = 32 s for 
both Doppler and lag axes), (c) The windowed WVD (rectangular window length = 31 s). 





Fig. 6. TFD representation of a multi-component signal composed of two impulses and one LFM in the ( t , f ) domain using the (a) MBD (ft = 0.5), (b) Hamming-Hanning- 
kernel TFD (window length = 32 s for both Doppler and lag axes), (c) windowed WVD (rectangular window length = 20 s). 



examples of these particular kernels previously used in the lit- 
erature or recently developed are given below. 

2.2.4. Specific separable kernel TFDs 
i. B-distribution Kernel ([33], p. 217): The B-distribution kernel 
is defined in the ambiguity domain as follows: 



g(V,T) = g 2 (T)C 1 (v) = 



, .p \r(j[+prv)f 

2 1 -^r(2 J 8) ’ 



|v| < 0.5, |r| < 0.5, and 0< ft <\, 



(25) 



where r, v and /3 are bounded to ensure that the B- 
distribution contains a low-pass filter. 
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Fig. 7. (a) The EMBD kernel with dominant lag-direction, (b) The EMBD with dominant Doppler-direction. 



The B-distribution kernel is the product of a low-pass filter 
Gi(v) = and a high-pass filter g 2 (r) = \t\P. The 

B-distribution gives a high-resolution TFD for selected small 
values of p [37]. A disadvantage is that a zero at the origin 
appears as a consequence of the high pass filtering performed 
along the lag axis i.e. g 2 (0) = |0|^ = 0. This results in a deteri- 
oration of the resolution of the B-distribution for certain types 
of signals [32]. This fact motivated its modification as detailed 
below. 

ii. Modified B-distribution: As explained above, the MBD-kernel 
was designed to overcome some of the problems of the B- 
distribution, based on the following observations 

1 The B-distribution usually gives high resolution (t, f ) rep- 
resentation for very small values of p (0 < p <$; 1) ([33], 

p. 118). 

2 For small values of p, the lag window is approximately an 
all pass filter (except for r = 0 ), i.e., g 2 (t) ~ 1. 

The kernel of the MBD includes only the Doppler window of 
the B-distribution kernels so as to avoid the distortion caused 
by high pass filtering along the lag-axis as shown in Eq. (25). 
The consequence is that the MBD gives the highest energy 
concentration only for signals whose IF does not vary rapidly 
with time, such as EEG seizure signals [38]. 

iii. Hamming-Hanning TFD: The corresponding Hamming-Hanning 
kernel is a 2D low-pass filter expressed as [32]: 



g(v, r) = (0.54 - 0.46 cos(27rr)) (0.5 - 0.5cos(7rv)), 

1111 

-- < V < -, -- < T < — . 



(26) 



This kernel results in a high-resolution TFD obtained by choos- 
ing a 2D low-pass filter along both lag and Doppler axes. This 
example is provided merely to show that any of the classical 
windows used for spectral analysis or digital filter design can 
be used as kernel filters, with specifications provided in the 
ambiguity domain. 

iv. Extended modified B-distribution: As discussed earlier, pre- 
vious studies have reported that the kernel of the lag- 
independent MBD mentioned in Eq. (25) gives optimal energy 
concentration for signals with negligible FM ([33], pp. 213- 
222), i.e., for signals whose IF is almost parallel to the time 
axis. The drawback of the lag-independent formulation of the 
MBD is that it does not allow smoothing along the frequency 
axis, thus limiting its scope for the analysis of signals whose 



auto-terms are parallel to the time axis. For example the MBD 
cannot be used for the analysis of a signal that is composed 
of a train of impulses as observed in some newborn EEG sig- 
nals, as it then would give poor results [1]. The EMBD exhibits 
improvement in some situations as it extends the MBD by 
applying its kernel filter along both lag and Doppler axes, re- 
sulting in the expression 32]: 



g(v, T) = 



\r(P + jnv)\ 2 \T(a + jnx)\ 2 



r 2 (p) 



r 2 (a) 



(27) 



where -0.5 < v < 0.5, - 0.5 < r < 0.5, 0 < p < 1 and 0 < 
a < 1. The lengths of the Doppler and lag windows are con- 
trolled by separate parameters a and p respectively. The extra 
degree of freedom in the formulation of the EMBD allows to 
independently adjust the lengths of the windows along both 
lag and Doppler axes as illustrated in Fig. 7. This makes it a 
more useful tool for the analysis of real-life signals, such as fe- 
tal movement signals, EEG signals with seizure and EEG spike 
signals. However, the EMBD has only one parameter for each 
kernel filter or window (that is a for lag and p for Doppler) 
to control both shape and size. Thus the EMBD does not allow 
adapting both length and shape of the smoothing window of 
the kernel filter independently, although it is an improvement 
on both the BD and the MBD. 

v. Compact support kernel TFD: Such compact support kernels 
(CSK) are designed to vanish outside a given range in the am- 
biguity domain. Unlike Gaussian windows they do not have 
infinite length, so there is no need to truncate those using 
rectangular windows that may cause loss of information. These 
TFDs have been shown to outperform other kernel-based 
methods in terms of their ability to suppress cross-terms while 
retaining the resolution of auto-terms in some cases [10]. Such 
high-resolution performance is achieved by these kernels by 
combining their compact support with a flexibility to adjust 
both shape and size independently, as discussed below. 

In order to explain the characteristics of the CSK, let us first 
observe in the formulation below that its kernel includes two 
components [10]: 



g(v, T) = 



I cD 2 . cD 2 
(p- c e v 2 — D 2 t 2 -D 2 ^ 

o, 



I v 2 <D 2 
1 r 2 <D 2 
otherwise 



(28) 



In the above, the two branches formed by the Doppler window 
and lag window of the CSK are respectively given by 
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Fig. 8. (a) The CSK as a Quasi-Rectangular window, (b) The CSK as a quasi-Gaussian window. 





Fig. 9. (a) The Spectrogram (Bartlet window length = 55 s). (b) Hamming-Hanning kernel TFD (lag window length = 128 s, Doppler window length = 0.07 Hz). 



f cD 2 




Gi(v) = 1 e c e v2 ~ D2 , 


\v\<D 


1 o. 


otherwise 


( cD 2 




gi( T)=\e c e* 2 - D2 ’ 


|T| <D 


|o. 


otherwise 



The above equations show that the shape and size of both 
Doppler and lag windows are determined respectively by the 
parameters c and D. These parameters allow the CSK to adapt 
both kernel length and shape. For example, the shape of the 
CSK in the ambiguity domain can be adjusted from the quasi- 
rectangular window to the quasi-Gaussian window as shown 
in Fig. 8. However, the CSK definition restricts the Doppler and 
lag windows to be of same length, so that the smoothing along 
time or frequency axis cannot be adjusted independently. This 
limitation is overcome by a new design procedure described 
in Section 2.3. 

2.2.5. Performance comparison: separable kernel TFD vs spectrogram 
To illustrate the potential advantage of using separable kernel 
TFDs over the Spectrogram in terms of energy concentration and 
resolution properties, let us consider a two component signal de- 
fined as: 



s(t) = 



cos (2 n (0.2 1 - 0.0006t 2 )) + cos (27T (0.3t + 0.0006t 2 )), 
0 < t < 127, 

0, otherwise, 



( 30 ) 



The Spectrogram and Hamming-Hanning kernel based TFD for this 
signal are shown in Fig. 9. The parameters of both TFDs were op- 
timized based on visual analysis. 

Fig. 9 shows that the separable kernel TFD gives high energy 
concentration for the given signal as compared to the Spectrogram. 
This improved performance of the Hamming-Hanning kernel based 
TFD is due to additional flexibility in its formulation to indepen- 
dently adjust smoothing along time and frequency axes. 

2.3. An advanced TFD design with improved resolution: the CI<D 

2.3.1. Non-uniform compact support kernel 

Eq. (29) indicates that neither the shape nor the size of the 
Doppler and lag windows can be adjusted independently of each 
other. This shortcoming limits the application of the CSK TFD to 
the analysis of signals whose energy is homogeneously distributed 
in the ambiguity domain. As a consequence, the CSK TFD cannot 
deal optimally with signals like EEG seizure signals, whose IF is 
almost parallel to the time axis, or EEG spike signals, whose IF is 
almost parallel to the frequency axis. 

To illustrate the limitations of the CSK TFD, let us consider a 
multi-component signal composed of two LFM signals. Fig. 10(a) 
shows the ambiguity function modulus of this signal and a CSK 
with optimal parameters. The CSK, even with optimal parameters, 
fails to retain all the auto-term energy in the ambiguity domain, 
thus resulting in a blurred TFD as shown in Fig. 10(b). Fig. 10(c) il- 
lustrates the ambiguity domain filtered by a separable kernel that 
is elongated along the lag axis to retain all auto-term energy. The 
separable kernel filters cross-terms while retaining all the auto- 
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Fig. 10. Ambiguity functions and TFDs of two close tones using a CSK TFD and a separable kernel TFD. (a) The optimized CSK in ambiguity domain, (b) CSK-TFD with 
optimized parameters, (c) Ambiguity domain representation of the optimal kernel TFD. (d) TFD representation after the application of the optimal kernel. 



terms, resulting in a higher-resolution TFD as shown in Fig. 10(d). 
The low-resolution performance of the CSK is due to the restric- 
tion in its formulation to have equal length for both Doppler and 
lag windows. 

2.3.2. Derivation of the extended compact support kernel 

The CSK can be extended by simply modifying the formulation 
of Doppler and lag windows so that their lengths can be adjusted 
independently. Such modified Doppler and lag windows can then 
be expressed as: 



f CD 2 




Gi (y) = | e c e v2 - Dl , 


|v| < D 


l o. 


otherwise 


f CE 2 




gi( r) = h c e^ 2 , 


|r|<£, 


o, 


otherwise. 



and 



(31) 



As mentioned above, the lengths of the modified Doppler and lag 
windows now depend on two parameters D and E. These param- 
eters are selected on the basis of shape and orientation of auto- 
terms in the ambiguity domain so that the maximum energy of 
auto-terms is retained and cross-terms are reduced. This leads to 
the following formulation of the Extended Compact Kernel (ECK): 



g(V,T) = G-t(v)g 2 (T) 

i cD 2 I 

J v 2_q2 

0 , 



r 2 — E 2 



| v | < D, |r| < E, 
otherwise. 



(32) 



The additional degree of freedom in the formulation of the ECK to 
independently adjust the lengths of the Doppler and lag windows 
is illustrated in Fig. 11. 

The resulting ECK performs smoothing along both time and fre- 
quency axes. Note that, like many other TFDs, it does not rigorously 
fulfill mathematical properties like time support, frequency sup- 
port, time and frequency marginals, and positivity ([33], p. 216). 
However, it satisfies the mathematical properties of realness, en- 
ergy conservation and (t, /) shift invariance [10], which are suffi- 
cient and convenient for a number of applications such as classifi- 
cation, as justified below. 

1. Real TFDs decrease the dimensionality of the classification 
problem as compared to complex TFDs. 

2. Time-shift and frequency-shift invariance are needed to ensure 
the consistent performance of pattern classification algorithm. 

3. The conservation of the total energy is necessary to preserve 
the relative energy between features of different classes of 
data. 

The TFD defined by the ECK kernel may be referred to as ECK TFD, 
but this abbreviation is further shortened to Compact Kernel Dis- 
tribution (CKD) for simplicity. 

2.3.3. Discrete-time implementation of the CKD 

The time-lag domain is often used for the discrete-time imple- 
mentation of TFDs for computational efficiency ([33], pp. 268-278). 
The discrete-time time-lag formulation of the TFD of the analytic 



B. Boas hash et al. / Digital Signal Processing 40(2015) 1 -30 



11 




(a) (b) 

Fig. 11. (a) The ECK: Quasi-similar to the rectangular window with a dominant Doppler direction, (b) The ECK: Quasi-similar to the Gaussian window with a dominant lag 
direction. (The dominant direction is shown by the red arrow). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of 
this article.) 



associate z[n] of the N-point real signal s[n] can be expressed as a) A synthetic multi-component signal composed of two LFM 
([33], pp. 237): components defined as: 



p[n, k] = 2 DFT(G[n, m] *(z[n + m]z*[n - m ])). (33) 

m^ir n v n 

The discrete-time variable n and discrete-frequency variable /< are 
related to the continuous-time and continuous-frequency variables 
by the following expressions ([33], pp. 232-241) 

n = tf s and k = f^~, (34) 

Is 

where f s is the sampling frequency. The resulting discrete TFD is a 
matrix of size NxM, where N is the number of time-domain sig- 
nal samples, and M is the number of frequency-domain samples. 
Following Eq. (32), the discrete time-lag formulation of the ECK is 
expressed as: 



cD 2 cE 2 

G[n, m] = DFT(e ( w ) 2 - d2 )e 2c e ^ )2 ~ e2 . (35) 

Its properties are discussed in the next section with respect to per- 
formance. 



2.3.4. Performance evaluation of the CKD 
Let us first consider an LFM signal as: 

s(t) = 4 rectj^J C os^2jr^/ O t+ |t 2 ^ +V^, (36) 

where A is the amplitude (assumed to be equal to 1), T is the total 
duration and represents a phase offset {if is often equal to 0), 
and rect[y] represents a rectangular window defined by: 



rect[r] = 



1 

0 , 



_1 < r < 1 
2 - Z - 2 

otherwise. 



(37) 



The frequency rate a is estimated by the difference between the 
maximal frequency / max and the initial frequency /o, divided by 
the total duration T : 




The sampling frequency is 1 Hz, the signal duration is equal 
to T = 255 s and the frequency rate a for both signals is the 
same and is given by: a = /max ' 1 r ~ /o ' 1 = (0233 5 - 01) = 0.0006, 
where / max , 1 and /o,i are respectively the maximal frequency 
and the initial frequency of the first LFM signal. 

b) An EEG seizure signal of 8-s duration sampled at 32 Hz. 

c) A fetal movement signal of 2.5-s duration sampled at 100 Hz. 

These signals are analyzed using the TFDs defined in previous sec- 
tions, including the EMBD, S-method, CSK-TFD, MBD, WVD, and 
CKD. The results are shown in Figs. 12, 13 and 14. The parameters 
of these TFDs for the synthetic signals are optimized with respect 
to Boashash-Sucic’s criterion [38] while for the real-life signal, 
their parameters are optimized on the basis of visual inspection 
(see Appendix A for the definition of this resolution measure). 

The resulting ‘P’ values for the TFDs of the synthetic signal are 
shown in Table 1. On the basis of both visual and quantitative anal- 
ysis, the CKD outperforms all the other TFDs for all these signals 
due to a better ability to reduce cross-terms while maintaining a 
high resolution of auto-terms. The S-method is the second best 
performing TFD for synthetic, real-life EEG and fetal movement sig- 
nals. Note that the performance of the MBD can be significantly 
improved by using a rectangular window of length smaller than 
the signal length, but such modification will make the MBD a lag 
dependent distribution, i.e., an EMBD. 

The CKD gives the best performance for signals whose auto- 
terms are nearly parallel to either time or frequency axis. However, 
its performance degrades for signals whose auto-terms have a spe- 
cific direction away from the time axis or frequency axis in the 
(t, /) domain. In such cases, the optimum can be reached us- 
ing directional filtering as a post-processing operation to design 
a high-resolution TFD as detailed in Section 4.2. 



a — 



/max /o 
T 



3. Performance assessment for multi-component instantaneous 
(38) frequency estimation 



The performance of the CKD defined above is evaluated using the For stationary signals, frequency estimation is used in a number 

following signals: of applications including music and speech to estimate the relative 
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Fig. 12. The TFDs of a multi-component signal composed of two LFM signals, (a) The CKD (c = 1, D — 0.05, E — 0.12). (b) The EMBD (a = 0.18, ($ = 0.06). (c) The CSK-TFD 
(c = 0.6, D = 0.05). (d) The WVD. (e) The MBD (fi = 0.16). (f) The S-method (Hamming window length = 4 s). 



strengths of each frequency component [39]. For non-stationary 
signal processing, the method extends to IF estimation which rep- 
resents the variation of the dominant frequency as a function of 
time 17]. This concept of IF is intrinsically linked to the concept of 
TFDs as the IF estimation capability of TFDs is often used as a cri- 
terion to evaluate their performance, including robustness against 
noise. The performance of the previously defined TFDs is compar- 
atively assessed below for both simulated and real-world signals 
in terms of IF performance estimation. This requires first to define 
clearly what is being estimated and the criteria for comparison. 
Let us consider first the simple case of mono component signal IF 
estimation. 

3.1 Meaning and estimation of the instantaneous frequency for 
mono-component signals 

In TFD based IF estimation the criteria of goodness for mono 
component signals are: no bias and minimum variance. These IF 
estimates can be defined as the peaks of TFDs or as their first mo- 
ments. One aim of defining high-resolution TFDs is to also yield 
high concentration TFDs, resulting in efficient IF estimates. 

As the concept of IF was introduced to represent the spectral 
variations of non-stationary signals [17], a simple example is the 
FM signals, for which the IF is the FM law. The easiest is the LFM, 



which is used in various applications, such as “vertical seismic 
prospecting” (VSP), where it is desired to study parameters such 
as soil absorption and dispersion 4]. For such signals, the WVD is 
the optimal representation in terms of energy concentration. 

Let us consider a mono-component analytic signal z(t ) modeled 
as z(t ) = a(t)e J< ^^\ such that a(t) is a low-frequency signal whose 
spectrum does not overlap with the high-frequency signal as 
per Eq. (1) and <p(t) is defined by <p(t) = arctan(Im(z(t))/Re(z(t)). 
The IF of the mono-component signal is defined as the time deriva- 
tive of its phase: 



m = 



i dm 

2n dt 



(40) 



A comparative review of fundamental algorithms for IF estimation 
appeared in [40]. More recent advances are reviewed next, setting 
the stage for the more difficult case in next section that is used 
as a test for comparing the new TFDs defined in the previous sec- 
tion. 



3.2. IF estimation for mono-component signals 

The main (t, f) approach to IF estimation follows the simple 
principle that the IF of a mono-component signal at each time in- 
stant can be estimated by detecting the location of peaks along the 
frequency axis in the (t, /) plane. 



B. Boas hash et al. / Digital Signal Processing 40(2015) 1 -30 



13 




2 4 G 8 ID 12 14 IG 

Frequency(Hz) 



(a) 




2 4 B 8 ID 12 14 IG 

Frequency(Hz) 



(c) 




Frequency(Hz) 

(e) 

Fig. 13. TFDs of a seizure signal, (a) The CKD (c = 1, D = 0.2, E = 0.05). (b) The EMBD 
(l 8 — 0.03). (f) The S-method (Hamming window length = 4 s). 

Table 1 

Resolution performance measure P of selected TFDs for a multi-component signal 
composed of two parallel LFM signals (The optimized parameters are obtained using 
an exhaustive search of the parameter that maximizes the resolution measure; it is 
not “optimal” in the classical sense. See details in Appendix A). 



Time frequency distribution 


Optimized parameters 


P 


Compact Kernel Distribution 


C = 1, D — 0.05, E = 0.125 


0.8919 


Extended Modified B distribution 


cy = 0.18, £ = 0.06 


0.8376 


Compact Support Kernel 


C = 0.6, D = 0.05 


0.7854 


Wigner-Ville Distribution 


Not available 


0.7063 


Modified B-distribution 


P = 0.16 


0.7628 


S-Method 


Hamming window length = 4 s 


0.8778 



Several algorithms are based on the above principle. For ex- 
ample, the WVD gives an ideal estimate for mono-component LFM 
signals so it can be used as an IF estimator in such cases. However, 
its estimate is biased for non-linearly FM signals [41], and so other 
methods are needed. In the general case, the bias can be overcome 
by estimating the IF from a windowed WVD, but windowing in- 
creases the variance of the IF estimate. An optimal window length 
for an optimum bias-variance tradeoff at each time instant can be 
estimated adaptively [41 ]. Following this approach, multiple WVDs 
can be used to obtain multiple estimates of the IF as well as the 
confidence interval for each estimate. Starting from the smallest 
window, the confidence intervals of the IF estimates of successive 
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(i a = 0.02, ft = 0.75). (c) The CSK-TFD (c = 0.1, D = 0.06). (d) The WVD. (e) The MBD 

windows are tested for overlap till the intersection of the confi- 
dence intervals of two successive windows results in a null set. 
The IF estimate of the longest window fulfilling the criterion of 
overlapping of confidence intervals is then selected as an optimal 
estimate. This method is known as the ICI rule where ICI means in- 
tersection of confidence intervals [41 . A modified approach further 
improves the accuracy of the ICI algorithm by taking into account 
the amount of overlap between two successive confidence inter- 
vals. Other iterative IF estimation algorithms can obtain the initial 
estimate of the IF from the Spectrogram [42]. The IP, defined as 
the integration of the IF, is then used to demodulate the original 
signal. The demodulation shifts the spectrum of the original sig- 
nal around the zero frequency. The IF of the demodulated signal 
is then estimated again and this process is iterated till there is no 
further change in two consecutive estimates of the IF. 

Most QTFDs fail to accurately estimate the IF at very low SNR. 
In such scenarios improvements can be obtained by taking into ac- 
count the main direction in the (t, /) plane where there is most 
energy concentration. For example, the adaptive combination of 
the directionally smoothed WVDs was shown to obtain a (t, /) 
representation that has high concentration of signal energy along 
the IF of the signal even at a very low SNR [43]. Such methods 
need to be further extended to deal with the more general multi- 
component case as addressed in the next section. 
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Fig. 14. TFDs of a fetal movement signal, (a) The CKD (c = 1, D = 0.075, E = 0.075). (b) The EMBD (a = 0.125, ft = 0.075). (c) The CSK-TFD (c = 1, D = 0.075). (d) The WVD. 
(e) The MBD (ft = 0.05). (f) The S-method (Hamming window length = 4 s). 



3.3. IF estimation for multi-component signals 



For a multi-component signal, the (t, /) approach follows the 
principle that a multi-component analytic signal may be expressed 
as the sum of N c mono-component signals, i.e., 



N c N c 

Z(t) = X>(t) = X>(t)e #i(t) (41) 

i=l i=l 

The IF of the ith signal component is defined by the time deriva- 
tive of its phase as 



fi(t) = 



i mt )_ 

2n dt 



(42) 



If the individual components of the signal are FM signals, then the 
multi-component signal appears as multiple ridges in the (t, /) 
plane. Multi-component IF estimation algorithms need then to in- 
volve both detection and linking of related peaks to estimate the 
IF of each signal component. The linkage stage is needed so that 
the IF of each component can be tracked accurately and without 
mix-up with the IFs of other components. 



3.3.1. IF linking algorithms 

Image processing techniques can be used to link all the points 
of each IF in the (t, /) domain [44]. The key stages involve: 



1. Detecting peaks in a TFD by using first and second order par- 
tial derivatives along the frequency axis. 

2. Linking the detected peaks using some neighborhood-con- 
nectivity criteria [44]. For example, using a 10-neighborhood 
connectivity criterion, a peak belonging to the IF of a signal 
component must have at least one other peak of the IF of the 
same signal component in its 10-neighborhood and it should 
not have any peak of the IF of any other signal component in 
this selected neighborhood. The 10-neighborhood of a peak at 
the location (x, y) is defined as a set of all pixels at the fol- 
lowing locations [44]. 

f (x+ 1, y) (x+l,y + l) (x + 1 , y + 2) (x+l,y-l) (x+l,y-2)l 

l(x-l,y) (x — 1, y + 1) (x — 1 , y — 2) (x-l,y-l) (x-l,y-2)J 

(43) 

This image processing linking algorithm is fully automatic and does 
not require prior information like the number of components, their 
IF laws and SNR. 

3.3.2. Multi-component IF estimation based on component extraction 
and ICI algorithm 

Let us consider for illustration another method, the ICI based 
IF estimation algorithm which was extended to multi-component 
signals by first extracting signal components using a blind source 
separation algorithm 45]. This algorithm iteratively extracts signal 
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Fig. 15. (a) The CKD (c = 1, D = 0.2, E — 0.05) of an EEG signal, (b) The IFs of signal components estimated using the method described in [44]. 





(b) 



Fig. 16. (a) The MSE of the IF estimates of si(t) obtained using high-resolution TFDs vs. SNR. (b) The MSE of the IF estimates of S 2 (t) obtained using high-resolution TFDs vs. 
SNR. In all these cases the IFs are estimated using the component extraction procedure [47]. 



components from the (t, /) plane till the total energy of the signal 
falls below a certain threshold. The disadvantage of this algorithm 
is its dependence on prior information like total number of signal 
components or SNR of a signal. 

A comparative study has shown that the above image pro- 
cessing algorithm based on connected component linking is more 
suitable for estimating the IF of the physiological signals consid- 
ered as it is computationally efficient as compared to the com- 
ponent extraction method [46]. Fig. 15 illustrates an example of 
IF estimation of a multi-component EEG signal using the im- 
age processing technique described in [44]. Both ICI with blind 
source separation and connected component linking algorithms 
can only be applied to (t, /) separable signals, i.e., signals whose 
components do not overlap with each other in the (t, /) do- 
main. Many of the real-world signals do not fulfill this property 
so these methods are not general and should be applied only 
to relevant classes of signals, therefore requiring a-priori knowl- 
edge. 

In order to design high performance multi-component IF es- 
timation algorithms for the general case, it is important to re- 
member that the performance of multi-component IF estimation 
algorithms depend on the resolution and cross-term suppression 
capabilities of the TFD employed. Recent studies have indicated 
that data-dependent high-resolution TFDs like the Adaptive Frac- 
tional Spectrogram (AFS) [47] and variable-bandwidth filter based 
heterogeneous TFD [9] can outperform other TFDs in terms of their 
ability to accurately estimate the IFs of closely spaced signal com- 
ponents at low SNR. The improvement is naturally obtained using 
an adaptive procedure so as to get a “perfect match” with the data. 
The previous TFDs are therefore compared below in terms of their 
IF estimation performance. The AFS is included in the comparison 
as an example of data dependent TFD. 



3.3.3. Comparison of TFDs in terms of their multi-component IF 
estimation performance 

In order to evaluate the IF estimation capability of the TFDs 
designed above, let us consider a two-component real signal ex- 
pressed as: 



S(t) = Si(t)+S 2 (t), 


(44) 


where 






si (t) = rect 


[ t - r ,28 ]co S (2„(o,r + °; 5 t>)). 


(45) 


s 2 (t) = rect 


[% ,28 ]c„s(2*(o,5 t+ °;V)), 


(46) 



with time duration T = 256 s and sampling frequency f s = 1 Hz. 
The corresponding IFs are 



fi(t) = 0.3y +0.1, (47) 

f 2 (t) = 0.3^ +0.15. (48) 

The TFDs considered are the EMBD, Spectrogram, CKD and AFS; 
for the latter the IF is estimated using the component extraction 
procedure as discussed in [47]. The mean square error (MSE) is 
estimated by performing 200 Monte Carlo simulations [48]. Fig. 16 
displays the results of the IF estimates. The results indicate that 
the CKD gives more accurate estimates of the IFs as compared to 
other non-adaptive TFDs, and naturally, the AFS does even better 
due to the adaptation. 

The above findings indicate that an accurate estimation of the 
IF of the signal components is directly related to the cross-term 
suppression and auto-term resolution properties of (t, f) methods 
employed. Therefore, there is a need to develop better methods 
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for improving the resolution of TFDs. The following section will 
discuss image processing methods for improving the resolution of 
TFDs. 

4. Time-frequency image processing techniques for TFD 
enhancement 

Clarity of representation is essential for TFDs, so that their rele- 
vant features can be read, selected and extracted. As all QTFDs suf- 
fer from limitations such as inherent compromise between cross- 
term suppression and auto-term resolution, it may adversely affect 
the extraction of some relevant features from ( t , /) images as well 
as components separation. To improve the readability, let us con- 
sider the TFD as a (t, f) image. Then, image processing techniques 
can be used as a processing stage to either improve the resolu- 
tion of cross-term-free low-resolution TFDs (e.g., the Spectrogram) 
using de-blurring methods or further suppress cross-terms in high- 
resolution TFDs (e.g., the WVD). Section 4.1 reviews existing image 
processing techniques for TFD image enhancement, and Section 4.2 
presents an improved novel image processing technique for im- 
proving the resolution of the class of QTFDs considered and there- 
fore allowing improved multi-component IF estimation. 

4.1. TFD enhancement using image processing techniques 

Given the number of different ways to generate a high- 
resolution TFD and the imperfections associated with each one of 
them, a natural approach is to process the TFD image using image 
processing techniques to improve the clarity of the image. Three 
types of processing are reviewed in the following sections: (t, /) 
image de-blurring, ( t , f) de-noising and (t, f ) image artifact sup- 
pression. 

4.1.1. Time-frequency image de-blurring 

Image de-convolution techniques are used for the estimation of 
high resolution images from blurred ones. For example, these tech- 
niques have been applied to estimate an original high-resolution 
TFD from the Spectrogram, which can be interpreted as a blurred 
version of the true (t, /) image [34]. Another method reduces 
the blurring in the Spectrogram by using artificial neural net- 
works [49]. This technique estimates the inverse of the blurring 
function using neural networks, which are trained using a pre- 
computed Spectrogram and the WVD. The trained neural networks 
are then applied to the Spectrogram to estimate the true high- 
resolution TFD. The disadvantage of this technique is that it results 
in a discontinuous TFD [49]. 

4.1.2. Time-frequency image de-noising 

One way to enhance (t, /) images is to filter out noise to im- 
prove their SNR, using image de-noising techniques. Such tech- 
niques often assume an additive white noise model to account for 
the distribution of noise in images. Unfortunately, this model is 
not valid in general for TFDs as earlier studies have shown, that 
apart from the WVD, noise is not uniformly distributed in (t, f) 
images [50]. The influence of noise is stronger in the region of sup- 
port of auto-terms as noise appearing outside the region of support 
of the auto-terms is reduced due to the low-pass filtering applied 
in the ambiguity domain. So, (t, /) image de-noising algorithms 
should be designed by taking into account the characteristics of 
noise distribution in QTFDs. 

Some of the image de-noising algorithms that have been ap- 
plied to (t, f) images are discussed below. 

a) Singular Value Decomposition (SVD) based methods: As sig- 
nals represented in the (t, f) domain are often characterized 
by just a few components, then, the resulting TFD p(n,k) can 



be considered as a sparse matrix. Such TFD is in general not 
full rank and its SVD can be expressed as [51]: 

p(n,k) =UDV h , (49) 

where U is an orthogonal matrix, V H is the conjugate trans- 
pose of the orthogonal matrix V, and D = diag(ai,« 2 , • • • , 
ofjv- r +i, • • • , aw) is a diagonal matrix in which the last r sin- 
gular values are equal to 0 ([aqv-r+i, ctN-r, • , (Xn] = 0j). In 
the presence of white noise, the TFD matrix becomes: 

pin, k) = UDV h + N = [UiDiVi] h . (50) 

The TFD matrix is now full rank with its last r singular values 
being non-zero. 

Based on this property, a simple and fast de-noising algorithm 
can be designed as follows 51]: 

- Divide the TFD matrix into sub-blocks B s . 

- Perform SVD on each sub-block B s = U s D s Vf and set to zero 
the last singular values that are smaller than a threshold e. 

- Replace the submatrix B s by B s = U s D s Vf (D s is the modi- 
fied diagonal matrix D). 

- Reconstruct the denoised matrix p(n,k). 

Fig. 17 illustrates the enhancement obtained using the afore- 
mentioned (t, /) image de-noising algorithm. 

Another SVD based de-noising method reduces noise by apply- 
ing low-pass filtering to singular vectors [52]. The smoothed 
singular vectors are then used to synthesize a de-noised 
TFD. Experimental results demonstrate that this method can 
suppress noise without deteriorating the basic structure of 
TFDs [52]. 

b) Wavelet based methods: These image de-noising techniques 
for improving the SNR of (t, f) images are based on the as- 
sumption that most of the (t, f) image energy is concentrated 
in only few coefficients of the Wavelet Transform (WT). The 
WT is obtained by decomposing a TFD into several bands us- 
ing a low-pass filter and a high-pass filter [53]. So, a (t, /) 
image is de-noised by assigning zero value to coefficients be- 
low a certain threshold and then inverting the WT. Most of the 
wavelet de-noising techniques use additive white noise model 
for noise. This model may be considered as a rough approxi- 
mation of the noise in QTFDs. 

c) Morphological processing: These techniques exploit the fact 
that signal components appear as ridges/curves occupying 
large connected regions in a (t, /) plane. On the other hand, 
noise is randomly distributed in a (t, f) plane and appears as 
small objects or disconnected regions [53]. These small objects 
can be eliminated from (t, /) images by using the image open- 
ing operation [54]. This operation compares a predetermined 
template (often called the structuring element) with objects in 
an image; if the size of an object is smaller than that of the 
template, it is deleted from the image; otherwise it is retained. 

d) Anisotropic diffusion [55]: Traditional 2D low-pass filtering 
schemes result in blurring in all directions regardless of object 
boundaries. The anisotropic diffusion overcomes this limita- 
tion by performing edge-preserving image enhancement such 
that the smoothing is performed on all the points in the (t, /) 
plane except the points lying near the objects boundaries (i.e., 
near the boundaries of support regions of the signal compo- 
nents). 

4. 1.3. Cross- term suppression 

Three separate techniques are reviewed below that use image 
processing techniques. 

a) Some techniques use morphological image reconstruction to 
suppress interference terms in the WVD 11]. Such techniques 
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Fig. 17. Illustration of the (t, /) image denoising algorithm, (a) The MBD of a two-component quadratic FM signal, (b) The MBD of the same signal corrupted by additive 
white Gaussian noise (SNR = 0 dB). (c) The denoised MBD obtained using the algorithm described in [51]. 



usually employ two images; one is a marker that contains the 
starting points of the transformation, while the other one is 
a mask that constrains the transformation. For example, the 
technique described in [11] processes the WVD, a marker in 
this case, using the characteristics of the Spectrogram to form 
a mask. The processing of the WVD starts at peaks. These 
peaks are then dilated while the thinned Spectrogram con- 
strains the spreading of the peaks. This process continues until 
the marker stops changing. This technique combines the ad- 
vantages of the WVD and Spectrogram so that the resulting 
representation has high energy concentration and is also cross- 
terms free. 

b) Another approach uses a non-linear center affine filter to sup- 
press the cross-terms in the WVD 56]; such a filter is de- 
signed to exploit the local statistics of a (t, /) image to adapt 
its shape. In the regions of high variance it transforms itself 
into a low-pass filter while in the regions of low variance it 
adapts itself into an identity operator. Thus it aims to reduce 
cross-terms while retaining the shape of auto-terms [56]. 

The limitation of the above cross-term suppression techniques 
[11,56] is that they fail to give optimum results when cross- 
terms overlap with auto-terms. 

c) A hybrid image and signal processing technique can be used 
to suppress cross-terms in the WVD even in scenarios when 
cross-terms overlap with auto-terms [12]. The key stages of 
this technique are shown in Fig. 18. The technique uses a 
blurred cross-term-free reduced-interference TFD to locate sig- 
nal component in a ( t , f ) plane. The reduced-interference TFD 



is then segmented using the connectivity criterion discussed in 
Section 3.3; each segment corresponds to a signal component 
in the time domain. Fractional Fourier filtering is then applied 
to perform time-varying filtering so as to extract signal com- 
ponents from the original signal. The extracted components 
are analyzed separately using the masked WVDs to suppress 
both inner and outer interference terms. The masked WVDs 
of the separated signal components are added up to obtain a 
cross-term free high-resolution TFD. This technique can also 
be used for estimating the IFs of multi-component signals as 
the extraction of signal components reduces the problem of IF 
estimation of multi-component signals to the basic IF estima- 
tion of mono-component signals. This technique can be more 
broadly applied than the first two methods, but it is still lim- 
ited to signals with non-overlapping (t, /) signal components. 
The step by step illustration of this third approach for a two- 
component signal is shown in Fig. 19. 

All the above mentioned (t, /) image processing methods are 
useful tools for the improvement of TFDs readability and energy 
concentration. However, these methods cannot overcome the fun- 
damental resolution limitation of QTFDs in dealing with closely 
placed signal components. The following subsection presents a 
different approach combining directional filtering and image pro- 
cessing techniques for defining a high-resolution TFD that can 
better deal with such a situation and resolve closely placed sig- 
nal components. 
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Fig. 18. Flow chart of the TFD resolution enhancement technique [30]. 




( d ) (e) (f) 

Fig. 19. Step by step illustration of a hybrid approach combining image and signal processing techniques to reduce cross-terms [12]. (a) The WVD. (b) The reduced interference 
QTFD. (c) (t, f ) image segmented such that each segment corresponds to a signal component, (d) The WVD of the first signal component (extracted using fractional filtering), 
(e) The WVD of the second component (extracted using fractional filtering), (f) The desired (t, /) representation obtained by first masking the WVDs of the extracted signal 
components with Spectrogram and then adding them, thus illustrating the image processing approach to improve TFDs’ resolution. 



4.2. Time-frequency image enhancement using adaptive directional 
filtering 

The performance that can be achieved by combining a ( t , f) 
approach with an image processing approach can be further en- 
hanced by using an adaptive directional filtering based (t, f ) image 
enhancement technique. This refinement includes adapting the di- 
rection of a (t, /) kernel at each ( t , /) point on the basis of the 



direction of energy concentration resulting, in essence, in the defi- 
nition of another high-resolution TFD. 

4.2.1. Review of signal dependent TFDs with automatic parameter 
estimation 

Most TFDs require optimization of their parameters to obtain a 
clear and accurate (t, /) representation with the best possible res- 
olution; e.g. incorrect selection of window length for the Spectro- 
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gram can result in a significantly different visual representation to 
the correct representation (see [33], p. 39). So, there is a need for 
some kind of automatic procedure for the selection of parameters 
in applications requiring real-time computations. Most of the exist- 
ing adaptive TFDs optimize the parameters of their kernels, either 
globally or locally, to maximize a certain measure representing the 
quality of a TFD [57-60]. Some of the commonly used quantitative 
measures for assessing TFD quality include the Kurtosis (Ratio of 
norms) [61], the Renyi Entropy [60], and the energy concentration 
measure [57]. These and other measures are described in detail in 
Section 5.3. For any given quality measure, the parameters of TFDs 
can be optimized by either using an exhaustive search technique 
or employing more computationally efficient methods such as the 
steepest decent recursive algorithm [57]. 

4.2.2. Design of the adaptive TFD 

Previous studies have reported that for underspread signals, 
directional filtering along the major axis of the auto-terms can 
significantly reduce the cross-terms without affecting the resolu- 
tion of the auto-terms [32,62]. These schemes naturally require 
prior information about the angle formed by the auto-terms and 
the time axis. The angle formed by the auto-terms and the time 
axis can be estimated from the moments of the fractional Fourier 
transform [63 for signals whose auto-terms have only one direc- 
tion of energy concentration in a ( t , f ) plane. In general, there are 
several possible directions of energy concentration in the (t, /) do- 
main and this would require separate filters for all these directions 
at a local level. Another solution is to design a post-processing 
technique to further improve high-resolution TFDs by adapting the 
direction of a smoothing kernel locally for each point in the (t, f) 
plane. Such adaptive kernel TFD can be expressed as an extension 
of Eq. (6), i.e.: 

Padapt(f> /) — Y0(t,f)(t’ /)(**) /)> (51) 

where p a dapt(G /) is the adaptive directional TFD (ADTFD), 
Yo(t,f)(t, /) is an adaptive kernel whose direction is adapted at 
each point in the (t, /) plane, and p(t, f ) is the QTFD used to 
start the process. A directional smoothing kernel should have the 
following properties: 

• The kernel should have the maximum response when aligned 
with ridges. This implies that the kernel should have low-pass 
characteristics along its major axis. 

• The kernel output should be zero for non-ridge points as oth- 
erwise spectral leakages would be observed at (t, f ) points 
where no signal is present. 




0.3 0.4 0.5 O.G 0.7 

Frequency (Hz) 



Fig. 20. Illustration of the direction of the auto-terms as well as the direction of the 
oscillation of cross-terms in the WVD of a signal composed of two LFM components. 



1. Cross-terms are oscillatory in the direction of their major axis, 
but auto-terms are not, as illustrated in Fig. 20. 

2. Filtering along the major axis of the auto-terms does not blur 
a TFD. 

3. Both auto-terms and cross-terms appear as ridges in a QTFD. 



The implication is that smoothing (low-pass filtering) along the 
major axis of cross-terms will suppress these cross-terms as they 
are oscillatory along their major axis (and they are localized away 
from the origin in the ambiguity domain) while smoothing along 
the major axis of the auto-terms will not affect their resolu- 
tion [67]. The direction of the smoothing filter for each (t, /) point 
is adapted to the direction of the major axis of ridges of a magni- 
tude squared QTFD. The reason is that cross-terms and auto-terms 
of a QTFD appear as ridges when the squared magnitudes are 
used because the squared magnitudes of cross-terms have non- 
oscillatory characteristics in the (t, /) domain due to the squaring 
operation. The ridge direction (i.e., the direction of maximum en- 
ergy of a TFD) is estimated by maximizing the correlation of the 
DGF with the magnitude squared QTFD; expressed as: 



%,/) = arg max 
o 



J J \p(t -t'* f - f')\ 2 Ye(tj)(t\ 

R R 



(53) 



In the above equation, p(t, f ) is squared to remove the oscilla- 
tion in the cross-terms as discussed above; on the other hand, 
Yo(t,f)(t> f) is not squared as, otherwise, it would change the 
shape of the DGF filter. Eq. (53) can be numerically solved by per- 
forming the following steps: 



The second-order partial derivative of a Directional Gaussian Fil- 
ter (DGF) is selected as a directional kernel as it fulfills the above 
properties. This filter has a maximum response when parallel to 
ridges because of low-pass characteristics along its major axis and 
its response reduces to zero as the filter is orthogonal to ridges be- 
cause of differentiation along its minor axis [64-66]. It is defined 
as: 

d 2[ e -((at e ) 2 +(bfe) 2 )] 

/) = —2 , 

oJq 

te=tzos(0) + /sin(0), f e = /cos(0) - tsin(0), (52) 

where 6 is the rotation angle with respect to the time-axis, a and 
b control the spread of the DGF along the time and frequency axis 
respectively. The design criterion to choose the direction of the 
adaptive kernel for each point in the (t, /) plane is based on the 
following observations [67]: 



• Convolve a magnitude squared TFD with I< directional filters 
such that the angle of each filter is given by: 

0k = 2 ff, k = h--,K. (54) 

i\ 

• For each ( t , /) point, choose the angle that maximizes the 
magnitude of a directionally smoothed TFD. 

Otj = aTgmax\y dk (t, f) ** \p(t, f)\ 2 \ 2 . (55) 

K k (t,f) 

The block diagram for the implementation of the proposed ADTFD 
is shown in Fig. 21. 

4.2.3. Performance evaluation 

In order to evaluate the performance of the ADTFD, let us con- 
sider three multi-component signals with the characteristics that 
their TFD auto-terms are parallel neither to the time axis nor to 
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z(t) 




Fig. 21. Implementation of the adaptive kernel TFD. (Note that Ok = 2 nk/K, where k = 1,2,... K.) 



Table 2 

Resolution performance of selected TFDs for a multi-component signal composed of 
two parallel LFM signals. 



TFD 


Optimal kernel parameters 


P 


CKD 


c = 1, E — 0.12, D — 0.05 


0.8919 


Adaptive TFD 


a — 2, b — 20 


0.9648 


S-Method 


Hamming window length = 4 s 


0.8778 


Extended Modified B distribution 


a = 0.18, £ = 0.06 


0.8376 


Modified B-distribution 


£ = 0.16 


0.7628 



the frequency axis. Most separable kernel TFDs fail to achieve a 
high (t, /) resolution for these signals because of the absence of 
the (t, /) energy concentration direction parameters in their de- 
sign. Such parameters relate to the presence and localization of 
the auto-terms. 

a) Two parallel LFM signals: The multi-component signal com- 
posed of two parallel LFM, described in Section 2.3.4, is used, 
for consistency and continuity, to evaluate the performance 
of the DGF adaptive (t, /) image enhancement technique, as 
compared with the CKD and S-method, as they are found in 
this study to be the best performing QTFDs in terms of res- 
olution and cross-term suppression capabilities. Fig. 22 illus- 
trates the results. The performance of the proposed TFD is 
evaluated using the Boashash-Sucic’s criterion [38], with re- 
sulting P values recorded in Table 2. The proposed adaptive 
TFD outperforms the CKD by 7% in terms of its ability to retain 
energy concentration of auto-terms while suppressing cross- 
terms. 

b) Multi-component signal composed of signal components of 
varying amplitudes and frequency rates: let us consider a 
multi-component signal composed of two LFMs, one Gaussian 
atom and one tone, expressed as: 

_ | s i (0 + s 2(0 + 53 (f) + S4(t), 0 <t<T, . . 

^ — [ 0, otherwise, ^ 

where the signal components have the characteristics shown 
below. 



Signals 


si(t) 


S2 (t) 


Formula 


cos(2jt (0.0006t 2 +0.16t)) cos(2jt (0.0006t 2 + 0.10) 


Signals 


S3 (0 


s 4(0 


Formula 


0.75cos(0.087Tt) 


2e -0.0025(t— 175) 2 sin(0.257Tt) 


The IFs of the signal components are, 


IF 


hit) 


hit) hit) hit) 


Formula 


0.0012t + 0.16 


0.0012t + 0.1 0.04 0.125 


The sampling frequency is 
T = 255 s for the signal 1 


1 Hz and the signal duration is 
components Si(t),S 2 (t) and S 3 (t). 



The corresponding effective time duration T e and effective 
bandwidth B e of S 4 (t) are given by T e = 20^/n /2 s and 
B e = 0.05/V27 r Hz (see [68] for these definitions). The to- 
tal signal has more than one direction of energy concentration 
and the distribution of the signal energy along each direction 
is not uniform. Given such complexity, this signal is difficult to 
analyze using existing fixed-kernel TFDs because these meth- 
ods do not adapt the kernel locally at each (t, /) point and any 
particular choice would be adapted only to one component, 
but would blur the others. The performance of the DGF (t, /) 
image enhancement technique is compared with the CKD and 
another adaptive TFD defined earlier [58,59] with results as 
shown in Fig. 23. The ADTFD gives high energy concentration 
for all signal components, whereas the CKD and the adaptive 
TFD [59] fail to concentrate energy for the weak signal compo- 
nent i.e., the tone as observed in the left-hand side of Fig. 23. 
c) Real-life EEG signal: The previous experiment is repeated us- 
ing a real-life EEG signal described in Section 2.3.4. Fig. 24 
shows that the ADTFD gives high energy concentration for the 
low-amplitude signal components appearing e.g., above 5 Hz, 
whereas other TFDs fail to concentrate energy for the weak 
signal components. 

The advantage of QTFDs such as the CKD is that they are useful for 
most non-stationary signals with different performance depending 
on the type of signal analyzed. In scenarios when such methods 
fail to give result with enough resolution, then image processing 
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(a) (b) 




Frequency (Hz) 

(c) 



Fig. 22. ( t , f ) representations of two parallel LFM signals, (a) The CKD (c = 1, E = 0.12, D = 0.05). (b) The adaptive (t, f ) image enhancement by DGF (a — 2,b — 20) and 
(c) the S-Method (Hamming window length = 4 s). 




Fig. 23. The (t, f) representations of a multi-component signal composed of signal components of varying amplitudes and chirp rates, (a) The CKD (C = 0.1, E = 0.065, 
D = 0.065). (b) The Adaptive kernel TFD [59]. (c) The ADTFD (a = 3,b = 8). 



based adaptive techniques can be used as shown in Figs. 22, 23 
and 24. These examples illustrate the possible advantage to use 
image processing techniques such as directional filtering for defin- 
ing high-resolution cross-term free TFDs. The additional load for 
doing so is assessed in the next section. 

4.2.4. Computational cost of the adaptive TFD 

The computation of the adaptive TFD involves the following 
steps. 



1. Computation of a QTFD: The computational cost of this step 
is 0(NM log M) as it requires N FT operations for a sig- 
nal of length N and the computational cost of each FT is 
O(MlogM), where M is the total number of frequency do- 
main samples [69]. 

2. Computation of an adaptive kernel: The computation of an 
adaptive directional kernel requires the estimation of an op- 
timal rotation angle for each point in a (t, f ) plane. It can be 
implemented by convolving a QTFD with I< directional kernels 



22 



B. Boashash etal./ Digital Signal Processing 40(2015) 1 -30 




Fig. 24. The (t, f ) representations of a real-life EEG signal, (a) The CKD (c = 1, D = 0.2, E = 0.05). (b) The Adaptive kernel TFD [59]. (c) The ADTFD (a = 3, b = 8). 



and choosing the one with maximum response at each ( t , /) 
point. The computational cost of this step is O(KNM) because 
of the convolution of a QTFD with I< directional kernels; given 
that the computational cost of single convolution operation is 
0(NM ) as the required number of multiplications and addi- 
tions is proportional to the total number of points in a (t, /) 
plane. 

3. Convolution of a QTFD with an adaptive kernel: The compu- 
tation cost of this step is 0(NM ) because it involves only a 
single convolution. 

The total computational cost of this algorithm is equal to the sum 
of the computational costs of all the above mentioned steps, i.e., 
0(NM log M + KNM + NM) or 0(NMlogM + (K + 1)NM). The 
computational load can be significantly reduced by exploiting the 
fact that ( t , /) matrices are sparse so most of the ( t , /) points do 
not need filtering at all. 

This section has illustrated that the image processing based 
methods can be used as post-processing operations to enhance 
QTFDs. Such post-processing operations not only improves the 
readability of QTFDs for signal analysis, but also helps in extract- 
ing highly discriminating features for pattern recognition applica- 
tions. A methodology for defining such (t, /) features from high- 
resolution TFDs is discussed in the next section. 

5. Time-frequency feature extraction 

Signal feature extraction is a key stage of any overall scheme 
for pattern recognition and classification of abnormalities or, more 
generally for any machine learning design and algorithm that re- 
quires automatic decision making. The spectral characteristics of a 
signal are traditionally used to obtain specific details about the un- 
derlying signal. Such frequency-domain features [26] include spec- 
tral flatness, spectral flux, entropy and average frequency. In the 
case of non-stationary signals with spectral content varying with 
time, such frequency-domain features may fail to give sufficient 
relevant discriminating information. The following subsections dis- 
cuss in detail the (t, /) extension of some selected frequency- 



domain features, as an illustration of the (t, /) feature extraction 
methodology. 

5.1. Time-frequency flatness 



The (t, /) flatness feature extends the notion of spectral flat- 
ness; it is used to measure the relative concentration of energy 
distribution in a TFD or conversely, its spread. A high (t, /) flat- 
ness indicates that the energy is uniformly spread in the TFD while 
a low (t, /) flatness indicates that the energy is concentrated in 
specific regions within the TFD. This feature helps, e.g., to discrim- 
inate TFDs of signal components embedded in noise from TFDs of 
pure noise. 

In order to derive the mathematical formulation for the (t, /) 
flatness, let us first consider the definition of the spectral flatness. 
The spectral flatness is the geometric mean of the FT of a signal 
normalized by its arithmetic mean; expressed as [70]: 



SF = M^. 

(Efcii \m\) 



(57) 



The (t, /) extension of the spectral flatness can then be defined 
by replacing the spectrum of the signal by its TFD. Eq. (57) then 
becomes the geometric mean of a TFD normalized by its arithmetic 
mean, which can be expressed as: 



TFflatness — 



(n^^pljhk^ 
mE n l,Ef=i Pln,k] 



(58) 



The (t, /) flatness assumes a zero value even if there is a single 
zero in a TFD. So, in practical implementations all zeroes of a TFD 
are replaced by very small values (e.g., epsilon in Matlab). 



5.2. Time-frequency flux 



The (t, /) flux extends the stationary spectral flux measure 
to non-stationary signals. The spectral flux measures the rate of 
change of the frequency content of a signal with time, whereas 
the ( t , /) flux measures the rate of change of the signal energy in 
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the ( t , /) plane both along time and frequency axes. It is expressed resolution but different levels of cross-term suppression. It is 

as: defined as ([33], p. 299, [72]): 



N-lM-q 

TFflux = Y) 53 + l > k + <?] - /°t n ’ fe ]|. 

n= 1 k=l 



(59) 



TFRNE = — - log 2 




p[n,k] 



E^ilPMJ 



(62) 



where / and q are predetermined values that depend on the rate of 
change of signal energy in a (t, /) plane. The values of the param- 
eters used in this study are: / = 1 and q = 1. This measure can be 
used to detect seizures in EEG signals as previous studies indicate 
that the energy of the newborn EEG seizure signals varies slowly 
along both time and frequency axes. This is inferred from the fact 
that these signals can either be modeled as piecewise LFM signals 
with slowly changing frequency or as train of impulses [1]. On the 
other hand, the EEG background appears as a random pattern. It 
follows that the (t, /) flux is expected to have a lower value for 
EEG seizure signals and a larger value for EEG background. 

5.3. Time-frequency measures of complexity 

A measure of complexity can be defined by interpreting a 
TFD as a quasi-probability distribution function, and using entropy 
measures as characteristics. The interpretation is that a highly con- 
centrated TFD with a small number of components has a lower 
entropy than a signal with a large number of signal components. 
Some of the entropy measures used in previous studies are de- 
fined below and compared in terms of their relevance to the main 
objectives of this study. 

1. Time-Frequency Shannon Entropy (TFSE): It is defined as ([33], 
p. 299): 



N M 

TFSE = - £ £ p[n, k] log 2 (p[n, k\). (60) 

n=l k = 1 

The major limitation of the (t, /) Shannon entropy defined 
above is that it cannot be used for the majority of QTFDs as 
these can take negative values, since the above expression is 
then not valid (log of the TFD). To remedy this, the Renyi En- 
tropy was used instead ([33], p. 298). 

2. Time-Frequency Renyi Entropy: It is defined as ([33], p. 298): 



Some other complexity measures that are not related to entropy, 
but can be used to discriminate a low resolution TFD from a high- 
resolution TFD, are mentioned below. 



• Kurtosis (or Ratio of norms): It is defined as the ratio of the 
fourth-order moment of a TFD with the squared second order 
moment [61]: 



RN = 



En = l Ektl P>,/<] 
(Enll El 1 P 2 ln,k]p- 



(63) 



This kurtosis is used to estimate the “peakedness” of TFDs by 
calculating the above ratio. A high value of the ratio of norms 
indicates that the TFD is highly concentrated while a low value 
indicates that the TFD is poorly concentrated. In many real-life 
situations, signal components can have significant variations 
in their relative amplitudes. For such signals, a TFD optimized 
based on the kurtosis can achieve a high energy concentration 
for strong components at the expense of poor energy concen- 
tration for weak signal components. 

• Energy concentration measure: One approach to estimate the 
spread of the energy distribution in the ( t , /) domain for 
any given TFD is to measure its support region (e.g., the 
number of non-zero values). A highly concentrated TFD will 
have a smaller region of support as compared to a blurred 
TFD. The following energy concentration measure can be used 
for estimating this support region for a normalized TFD (i.e., 
EE=iEfc=i/°[n.k] = U[57]. 



EC = 



N M 






q 



Q> 1 



(64) 



Unlike the kurtosis, TFDs optimized on the basis of this mea- 
sure do not suffer from the problem of poor energy concen- 
tration for weak signal components [57]. 



N M \ 

( 61 ) 

n = 1 k= 1 / 

with a > 2; for a = 2, the above measure becomes equal to 
zero for a normalized TFD (i.e., J2n=i Hh=\ P a l n , k] = 1 for 
a — 2). The (t, /) Renyi Entropy overcomes the limitations of 
the TFSE by applying the log operator to J2n = l ZjUii [n, /<], 
where J2n=i J2k= i P a [nJ<\ > 0 for any valid TFD. The (t, /) 
Renyi entropy is a useful feature for the classification of non- 
stationary signals because of its sensitivity to the number of 
signal components, their time duration and bandwidth, and 
their amplitude ratios [71]. 

3. Time-frequency normalized Renyi entropy (TFNRE): Zero- 
mean cross-terms are reduced in the (t, /) Renyi entropy for 
odd values of a because of the summation operation, so the 
( t , /) Renyi entropy cannot discriminate a high-resolution TFD 
with significantly reduced cross-terms from a high-resolution 
TFD without any suppression of cross-terms. This limitation of 
the (t, /) Renyi entropy can be overcome by first normalizing 
the TFD with the sum of its absolute values so that cross- 
terms have an overall effect of reducing the TFNRE. The TFNRE 
is thus preferred to evaluate the resolution performance of 
a TFD as it can differentiate between two TFDs of the same 



In the reminder of this study, the normalized Renyi entropy is 
used as the results confirm that it gives better classification re- 
sults as compared to the Renyi entropy. There are other measures 
of complexity that have been defined and used, e.g., in [73]; but 
they are not considered here due to space, scope and relevance 
limitations. Note that this section has illustrated a methodology 
to define new (t, /) features by extending frequency-domain fea- 
tures to the (t, /) domain. A similar methodology can also be used 
to define new features by extending time-domain features to the 
(t, /) domain [26]. A list of such features is given in Appendix D. 

6. Results, discussion and further insights 

Receiver Operating Characteristics (ROC) analysis is used to 
compare the performance of the (t, /) features with the corre- 
sponding frequency-domain features for the detection of seizure 
activity [74]. The area under ROC curve (AUC) is used as a per- 
formance metric. The AUC of a feature is the probability that the 
feature would have a higher value for a randomly chosen positive 
example than for a randomly chosen negative example [74]. A set 
of 200 epochs of newborn EEG background and a set of 200 epochs 
of newborn EEG seizure are randomly extracted from a database 
for this study. The length of each segment of database is 256 sam- 
ples (with 32 Hz sampling frequency), corresponding to a duration 
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Table 3 

AUC analysis of the frequency-domain features. 


Features 


AUC values 


Spectral Flatness 


0.6632 


Spectral Entropy 


0.5901 


Spectral Flux 


0.5483 



Table 4 

AUC analysis of the (t, /) features obtained by extending the frequency-domain fea- 
tures to the (t, /) domain. These features are extracted from the WVD, Spectrogram, 
ADTFD, EMBD, and CKD. 



Features 


ADTFD 


CKD 


EMBD 


WVD 


Spectrogram 


( t , /) Flatness 


0.9092 


0.7997 


0.7610 


0.6922 


0.6254 


(t, f) Entropy 


0.6631 


0.7207 


0.7228 


0.6157 


0.7200 


(t, /) Flux 


0.7792 


0.7103 


0.5490 


0.7344 


0.6280 



of 8 s. The relevant database is described in [75], and the relevant 
components used in this study are provided as Supplementary ma- 
terial. 

In order to extract EEG features, each segment is analyzed by 
selected TFDs; these are the Spectrogram (SPEC), WVD, EMBD, 
CKD, and ADTFD. The parameters that lead to the highest AUC 
values are selected for each TFD by performing an experimental 
non-exhaustive search. The selected smoothing parameters for the 
EMBD are a = 0.05 and p = 0 . 05 . The selected window length L 
for the SPEC is the Hamming window of L = 75 samples. The se- 
lected parameters for the CKD are c = 1, D = 0.04 and E = 0 . 04 . 
The selected parameters for the ADTFD are a = 3 and b = 8. Note 
that we have not used the MBD and CSK-TFD for performance eval- 
uation as they are special cases of the EMBD and CKD as discussed 
in Section 2 . The performance of ( t , /) features is compared with 
frequency-domain features using an AUC analysis. Table 3 shows 
AUC values for the frequency-domain features while AUC values 
for the corresponding (t, /) extensions are indicated in Table 4 . 

The key observations regarding the AUC analysis of signal re- 
lated features as observed from Tables 3 and 4 are given below. 

1. (t, /) features, which are extensions of the frequency-domain 
features, lead to higher AUC values than corresponding fre- 
quency-domain features for all TFDs with the exception of the 
Spectrogram. The Spectrogram has lower AUC value for the 
(t, /) flatness than the corresponding frequency-domain fea- 
ture i.e., the spectral flatness. 

2. The (t, f) flatness is the best performing (t, /) feature with 
AUC value of 0.9092 for the adaptive TFD. 

3. The adaptive TFD is the most suitable TFD for extracting (t, /) 
flatness and (t, /) flux because of higher AUC values of these 
features for the adaptive TFD; the CKD is the second best TFD 
for (t, /) flatness while the WVD is the second best TFD for 
(t, /) flux. 

4. The EMBD is the best TFD for extracting the (t, /) entropy 
followed by the CKD. 

5. The poor performance of the Spectrogram for extracting the 
(t, /) flatness can be explained by the sensitivity of the Spec- 
trogram to its window length, which makes it difficult to find 
an optimal window for a large class of signals. 

6.2. Discussion and interpretation of results 

The experimental findings indicate that the performance of 
frequency-domain features can be significantly improved by ex- 
tending them to the (t, /) domain. It is important to use high- 
resolution TFDs for extracting (t, /) features as the best perform- 
ing (t, f) features are extracted from the ADTFD, which is also the 
best performing TFD in terms of resolution and cross-term sup- 
pression capabilities. These observations indicate that, for a given 



TFD, the performance of a (t, /) abnormality classification tech- 
nique can be directly related to the resolution of a TFD; hence the 
need to define data-dependent high-resolution QTFDs suitable for 
specific classes of signals. The improvements obtained as a result 
of (t, /) translation of features can be explained as follows: 

• The Spectral flatness and Spectral entropy measure the unifor- 
mity of energy distribution in the frequency-domain. They can 
discriminate a narrowband signal from a wideband signal, but 
they cannot discriminate two wideband signals; e.g. an LFM 
signal cannot be discriminated from white noise. The (t, /) 
flatness and (t, /) entropy can discriminate random noise from 
signals whose energy is concentrated in the ( t , /) domain. 
The energy of EEG seizure signals is usually concentrated 
along the IFs of their components, whereas the energy of EEG 
background is usually spread in the (t, /) domain. Therefore, 
seizure signals have lower (t, /) flatness and (t, /) entropy as 
compared to background signals. 

• The better performance of the (t, /) entropy can be explained 
by the observation that EEG seizure signals have slow variation 
of signal energy along both time and frequency axis, whereas 
the background has a random distribution of signal energy. 
This implies that for seizure signals there is a little variation of 
signal energy along the diagonal axis. The (t, /) flux exploits 
this information by measuring the total variation in the sig- 
nal energy along the diagonal axis, which results in low values 
for seizure signals and higher values for the background. On 
the other hand, the spectral flux only measures the variation 
in spectral content of EEG signals and cannot measure the rate 
of change of signal energy along the diagonal axis. 

In order to extend and apply the proposed (t, /) pattern recogni- 
tion approach to make it suitable for a wide range of applications 
and real-time implementation, several key points need further in- 
vestigation, as described next. 

6 . 2 . Additional time-frequency insights to further improve performance 

6 . 2 . 2 . Designing high-resolution adaptive TFDs 

The performance of all QTFDs depends on the choice of certain 
parameters. For example, the shape and size of an analysis window 
controls the time and frequency resolution of the Spectrogram. 
These parameters are data dependent and one set of parameters 
cannot give optimal results for all classes of data. Moreover, it is 
difficult to optimize a single global kernel for all the points in the 
TFD as there can be many components in a signal with different 
directions of energy concentration. In order to automatically select 
the optimal parameters that lead to a high energy concentration 
TFD in which cross-terms and noise are reduced, several adaptive 
methods were proposed [58,61,59,76]. These methods (e.g., [61, 
59]) automatically adapt the parameters of the (t, f) kernel at each 
time instant, but there is a need in some situations to extend these 
algorithms so that the kernel parameters are optimized locally at 
each (t, f) point as done in Section 4.2. 

6.2.2. Designing multidirectional fixed-kernel TFDs 

Separable-kernel TFDs give good performance for signals whose 
energy is either concentrated along the time axis or the frequency 
axis in the (t, /) domain or along a direction with small deviation 
from one of these two axes. However, these TFDs fail to give opti- 
mal energy concentration for signals whose energy is concentrated 
along a direction significantly away from both time axis and fre- 
quency axis in the (t, /) domain. In order to obtain high-resolution 
(t, /) representation for such signals, a rotation parameter can be 
included in the formulation of smoothing kernels [63]. For exam- 
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(c) 

Fig. 25. The (t, f ) representations of a multi-component signal composed of signal components of varying amplitudes and chirp rates, (a) The CKD (c = 0.1, E = 0.065, 
D = 0.065). (b) The multi-directional kernel (6>i = 0°, 0 2 = 30°, c = 0.1, D — 0.03). (c) The ADTFD (a = 3, b = 8). 



pie, an additional parameter 0 can be included to allow for the 
rotation of the CSK, resulting in the following expression. 

{ , v cost#)— r sin(0) .o 7 , . . „ „ 

e ( d > , | cos(0)v — sin(0)r| < D, 

0, otherwise, 



CKD fails to resolve the close components due to the lack of di- 
rection parameter in its formulation. The adaptive TFD, due to 
the local adaptation of its kernel, has resolved the close compo- 
nents without degrading the energy concentration for the Gaussian 
atoms. This MDD should therefore be a preferred TFD for non- 
stationary multi-component and multi-directional signals such as 
(65) EEG seizures. 



where 0 is the angle of the kernel with the Doppler axis in the 
ambiguity domain or the time axis in the (t, /) domain. Rotated 
(t, /) kernels cannot give ideal representations for signals that 
have more than one possible directions of energy concentration 
in the (t, /) domain. Adaptive kernel methods, such as the one de- 
fined in Section 4.2.2, are well suited for such signals, but these 
methods are computationally expensive. One way to analyze sig- 
nals with multiple directions of energy concentration in the (t, /) 
domain is to perform smoothing in multiple directions. An exam- 
ple is provided by the following expression which defines a new 
multidirectional kernel as a summation of a predetermined num- 
ber of directional kernels (as defined in Eq. (65)): 

e c Nd 

g(v,r) = — Y'g(v,T,e i ), (66) 

N ° ,=1 

where No is the total number of directional kernels, and 0; is the 
angle of the ith kernel. 

In order to compare the performance of this multi-directional 
distribution (MDD) with the other methods, let us use the multi- 
component signal expressed in Eq. (56). This signal is selected as 
it has multiple directions of energy concentration in the (t, /) 
domain, so conventional methods that do not take direction into 
account cannot be used to optimally analyze this signal. Fig. 25 
shows this signal analyzed using the MDD and the two best per- 
forming TFDs that are the CKD and adaptive TFD. The MDD gives 
highest energy concentration for the two parallel LFM signals and 
tone, but fails to concentrate energy for the Gaussian atom. The 
lowest region of performance of the MDD kernel concerns the 
Gaussian atom, which is due to a lack of specific direction. The 



6.2.3. Time-frequency image enhancement of TFDs 

Some of the state of the art (t, /) image enhancement tech- 
niques discussed in Section 4 and others such as [77] can be used 
as post processing operations. The aim is either to improve the 
resolution of cross-term-free low resolution TFDs (e.g., the Spec- 
trogram) by using de-blurring methods or to suppress cross-terms 
in high-resolution TFDs (e.g., the WVD). By including such image 
enhancement techniques within the overall methodology of abnor- 
mality detection, improvements in performance can be obtained in 
applications such as classification of physiological signals. For ex- 
ample, this study illustrates that the two best (t, /) features (i.e., 
(t, /) flatness and (t, /) flux) for the detection of seizure activity in 
EEG signals are obtained from (t, /) images enhanced by the DGF 
(i.e. the ADTFD). The expense for this result is an additional com- 
putational cost, whereas among the non-adaptive methods (i.e., 
without enhancement) the CKD gives the best performing features 
and this method is computationally less expensive. 

6.2.4. Extraction of geometric features 

Signal components appear as pockets of energy concentration in 
the (t, f) domain or as segments in (t, f) images. In these cases 
image segmentation techniques can then be used to locate these 
pockets of energy concentration [78 . Geometric features can then 
be extracted from these segments to get information such as the 
number of signal components, their relative positions in the ( t , f) 
plane and the energy distribution of each signal component in the 
(t, /) plane. This information is critical for abnormality detection 
using change analysis. For example, in the case of a heart sound 
signal, the murmur appears as an additional pocket of energy in 
the ( t , f) domain (see Fig. 4 in [5]). So, the number of pockets 
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Fig. 26. The proposed block diagram of a multi-channel (t, f ) abnormality detection 
methodology using channel-by-channel approach. 




Fig. 27. The proposed block diagram of the multi-channel (t, /) abnormality de- 
tection methodology using spatial TFDs. Arrows in red indicate cross-TFDs, while 
arrows in blue indicate auto-TFDs. 

of energy concentration, their location and their geometric prop- 
erties can be used as features in machine learning algorithms for 
automatically detecting an abnormality in a heart sound signal. 

6.2.5. Extraction of features from multi-channel signals 

In many cases, multi-channel and multi-modal recordings are 
produced, as in the case of physiological signals. Let us consider 
two approaches for extracting (t, /) features from multi-channel 
physiological signals, as outlined below. 

1. Analyze each channel separately using a high-resolution TFD 
and then extract features from the TFD as shown in Fig. 26. 
This approach ignores mutual information among various 
channels. 

2. Analyze a multi-channel signal using a spatial TFD (STFD) 
and then extract features from the STFD matrix as shown in 
Fig. 27. For a multi-channel signal x[n\ = [x\ [n], * 2 [n], ^[n], . . . 
Xp[n]], an STFD matrix for time index n and frequency index /< 
is defined as ([33], pp. 325): 

M N 

p[n,k]= E E G[n - p, m]x[p + m]x H 

m=—M p=—N 

x[p-m]e- 4i7t ^, ( 67 ) 



where G[n,m ] is a time-lag kernel. Each element of a STFD 
matrix is defined as: 

M N 

[p[n,k]] it = E E Cln- p,m]Xi[p + m]x* 

m=—M p=—N 

x [p - m]e -4j7r ^. (68) 

The off-diagonal elements of the STFD are the cross-sensor 
TFDs and the diagonal entries are the standard auto-sensor 
TFDs of the multi-channel non-stationary signals. Such STFD- 
based methods can exploit both the non-stationary character- 
istics of the signals and the spatial diversity provided by the 
multiple sensors. This is of course at the expense of complex- 
ity and a large number of features. 

The aforementioned feature extraction approaches can be easily 
adapted for multi-modal signals by interpreting signals of other 
modality acquired from other sensors as separate channels. 

7. Conclusion 

This tutorial review paper describes two key methods and 
techniques needed to design (t, /) pattern recognition techniques 
suitable for the classification of non-stationary signals such as 
those encountered in physiological measurements. The two prin- 
cipal stages include: 

1. Design of high-resolution TFDs for better extraction of infor- 
mation defining discriminatory features. 

2 . Definition of new (t, /) features by extending traditional time- 
domain or frequency-domain features to (t, f) features such as 
(t, f) flatness and (t, /) flux. 

The performance of the ( t , f) features is evaluated using ROC anal- 
ysis. The findings show that (t, f) features perform significantly 
better than frequency-domain features; and high-resolution TFDs 
result in (t, f) features with higher AUC. The other findings of this 
study with respect to TFD design are: 

1. The Spectrogram is easy to use as it depends only on the win- 
dow size and shape; but it is too sensitive to the window 
variations. Improvements on the Spectrogram require the use 
of separable kernel TFDs with more control parameters as in 
the EMBD, CKD, multi-view TFD, adaptive TFD. 

2 . LI-TFDs are more suitable for signals that can be modeled as 
pure tones, i.e., signals with energy concentration parallel to 
the time axis in the (t, /) domain or small deviation from the 
time axis; this includes, e.g., some EEG seizure signals. 

3. DI-TFDs are more suitable for signals that can be modeled as 
a train of impulses, i.e., signals whose energy concentration 
is parallel to the frequency axis in the (t, /) domain; this in- 
cludes, e.g., spike signals. 

4. Separable-kernel TFDs have more degrees of freedom to adapt 
their smoothing kernels as compared to the Spectrogram, DI- 
TFDs and LI TFDs. They can be optimized for both signals 
composed of spikes as well as pure tones. 

5. The CKD is the best performing separable-kernel TFD as it has 
more control parameters to adapt both the shape and size of 
its smoothing kernel. It is an extension of the EMBD which 
is itself an extension of the MBD, a modified version of the 
BD. This best performance is obtained at the cost of greater 
complexity. 

6. The adaptive TFD is an extra refinement that is especially suit- 
able for signals that have more than one direction of energy 
distribution in the (t, /) plane. It can be applied to the best 
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high-resolution TFD selected for a large class of signals; its 
improved performance is obtained at the expense of an ad- 
ditional computational cost. 

In order to further improve the performance of existing ( t , /) ap- 
proaches with respect to pattern recognition and machine learning 
requirements, several additional investigations are needed includ- 
ing: 

1. Developing algorithms for the automatic design of data de- 
pendent high-resolution TFDs and IF estimation, taking into 
account directional filtering of TFDs on the basis of energy 
concentration location criteria. 

2. Investigating the further use of directional filtering to extract 
texture related features. 

3. Applying (t, /) image segmentation techniques to extract geo- 
metric features to facilitate 1) and 2) [78]. 

4. Investigating the design of TFDs for compressed sensing [79]. 

The results from these additional investigations should help bring 
these techniques to a wider range of applications, and enable ad- 
vanced machine learning in a ( t , /) domain. In particular, the 
above methodologies and results presented in this paper indicate 
that in the context of detecting abnormalities in newborn brains, 
systems that aid experts decision making can be implemented for 
use in neonatal intensive care units on a trial basis. The same is 
true for other applications such as condition monitoring and fault 
detection in a wide range of industrial applications. 

The Appendices A-E are provided for completeness and to en- 
sure that the results presented can be reproduced independently 
by the reader. 
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Appendix A. Resolution performance measure used 
for comparisons of TFDs 

The normalized performance measure P introduced in [38] 
takes into account the proportion of cross-terms in the TFD and 
provides a measure of resolution of the TFD, as illustrated in 
Fig. 28. The measure ‘P’ is defined as: 




Fig. 28. The two dominant peaks are the signal components; the middle peak is the 
cross-term, and the other peaks are side lobes. 

followed by signal classification using a trained classifier. The cost 
of feature selection is not considered as this step is part of of- 
fline training. The computational cost of implementing a TFD is 
0(NM log(M)), where N is the number of samples of the signal 
being considered and M is the number of sample points in the 
FFT computations. The computation of a TFD requires N FT op- 
erations (one at each time instant) and each FT is of the order 
0(Mlog(M)). The computational complexity of feature extraction 
in general is 0 (NM) as there are NM points in the (t, /) plane 
and the number of mathematical operations required to extract 
any given feature is proportional to the total number of points. 
The computational complexity of abnormality detection based on 
the trained classifier is 0(Nf ); where Nf is the total number 
of features. The total computational cost of online computation 
is therefore 0(NMlog(M) -h NM -h Nf) or simply 0(NMlog(M)), 
given that, in the analysis of algorithms, computational costs of 
higher order is only considered as per general practice [80]. The 
above analysis indicates that the TFD computation is the major 
bottleneck in trying to achieve an efficient real-time implementa- 
tion. Further optimization can be made by using specially designed 
computational and memory efficient algorithms that have been de- 
veloped for TFD implementation [69], implementing ( t , f) kernels 
as multi-taper Spectrograms [81], and using efficient hardware im- 
plementations such as field-programmable gate arrays. 






Astt) 

Am(t) 



1 

2 A m (t) 




(69) 



where A m (t) and A s (t) are respectively the average amplitudes of 
the components’ mainlobes and sidelobes, A x (t) is the cross-term 
amplitude. The parameter S(t) = (/ 2 (f) - - (/i(0 + 

represents the component’s separation measure and D(t ) = / 2 (f) — 
/1 (t) is the distance between the peaks of the two components. 



Appendix C. Computer programs in this study 

All simulations have been carried out using Matlab and the TF- 
SAP tool-box. The TFSAP tool-box can be downloaded from the fol- 
lowing link. http://espace.library.uq.edu.aU/view/uq:211321, http:// 
faculty.qu.edu.qa/boualem/tfsa.aspx, and/or http://www.time- 
frequency.net. The corresponding code can be found in [82,33]. 



Appendix B. Computational complexity and real-time 
implementations 

The implementation of the abnormality detection methodolo- 
gies described above involves two stages: offline training and on- 
line implementation. The computational complexity of the online 
implementation is often a major bottle neck in practical realiza- 
tions of pattern recognition schemes. For the algorithms described 
above, this involves TFDs computations 69], features extraction, 



Appendix D. Time-frequency extension of time-domain features 

Table 5 gives the list of (t, /) features that are obtained by 
translation of time-only features to the (t, /) domain. 

Appendix E. Medical data used in this study 

Supplementary material related to this article can be found on- 
line at http://dx.doi.Org/10.1016/j.dsp.2014.12.015. 
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Table 5 

Time-frequency extension of time-domain features. 



Feature 


Time- 


domain representation 


(t, /) extension 




Mean 


m (t) = 




m (tJ) = 




Variance 


a (t) = 


h EjLi < S M - m rn) 2 


a (t,f) = 


m E?=i EkL 1 (Pin, k] - m (t j)) 2 


Skewness 


Y(t) = 


iVCT( t) 


= 


1 y^N y^M 

NMcr ( 3 f f) ^ n = 1 2^k=; 


i (p[n,k\-m (tJ) ) 3 


Kurtosis 


k (t ) - 


T.L 1 < S M - m ®) 4 


k (t,f ) = 


1 y^N y^M 

NMcr* tf) ^ n = 1 ^ k=1 


( p[n,k\-m (tJ) ) 4 


Coefficient of variation 


c (t) - 


°V) 

m (t) 


c (tJ) = 
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